Avoid this Surprise when Determining How Long your Next Process Outage Will Take
By: Daniel Evoy
Continuous processing or manufacturing facilities will occasionally require planned outages to primarily address reliability or maintenance issues. Due to the high cost of these outages (opportunity and real), careful planning and scheduling practices are used to come up with detailed execution plans to reduce the planned duration as much as possible. While outage duration is typically determined using the critical path methodology, large quantities of smaller but similar in nature jobs can lead to unexpected schedule extension. A planning team recently applied this lesson-learned by calculating the schedule impact of such critical mass work.
Importance of Determining a Realistic Duration
Almost all continuous processing or manufacturing industries will require some form of planned production outage at a regular interval. These are typically required to perform inspection and maintenance work but can also be for equipment upgrades or process changes. This work is more often driven by safety, environmental (regulatory), equipment reliability or production quality considerations, among others.
The basic economics behind these industries result in the outages to adversely affect the ‘bottom-line’. In some cases, the opportunity cost (such as, buy vs. make decisions) can be in the millions of dollars per day. Add to this the actual cost of doing the work during the outage – can also be millions of dollars per day – and you end up with the strongest case for not only reducing the outage’s duration to its minimum but also ensuring that it is a realistic one.
In most cases, lost production due to the outage must be made up by either pre-building inventories, securing purchases from other locations or third-parties or reducing shipments to customers.
While never experienced by this author, the same reasoning could be applied to outages of continuous services. Examples that come to mind include major software changes, relocation of offices or medical services, and so on.
A few years ago, a planning team put in place a simple methodology to ‘test’ an outage’s duration against a large amount of similar work, none of which individually determined the overall duration. This article will describe how this was successfully done.
Planning vs Scheduling
Many papers and industry documents are available to describe outage planning and scheduling best-practices. We will not attempt to duplicate these here but the planning team rigorously applied the following:
- All activities were planned. This included all production slowdown and shutdown, decommissioning, quality control, recommissioning and start-up steps
- Simply put, the planning work covered who (how many) is going to do what and how, for how long and with what resources, e.g., tools
Once this was largely completed, the team then proceeded to logically connect the plans, identifying predecessor, successor and parallel steps. This was deliberately done starting when the production ended to return to service “on spec”, including any necessary production ramp-up.
Using a specialized software, the team could then evaluate the work and determine its minimum duration
Critical Path Duration
Here again, much has been written on determining critical path (CP) duration of a logically-connected series of steps. The planning team, as a first pass, used this technique to calculate the outage duration.
A quick recap: CP is the sequence of dependent tasks that form the longest duration, allowing you to determine the most efficient timeline possible to complete a project. In the simple example shown in Figure 1, we can quickly determine that:
- The duration for the series of activities following the top path is 3 days,
- The middle one takes only 1 day,
- The bottom path (with red arrows) adds up to 4 days
Therefore, the critical path duration is 4 days.
The team’s next step was to check the sequence of steps for its integrity and for any inherent execution risks.
As before, the team made use of some specialized software to accomplish this. They were specifically looking for
- Open, incomplete or missing logic
- Work sequence conflicts such as circular logic
They were also able to build-in any uncertainty to individual activity’s duration. This allowed them to come up with a probability curve for the overall event’s duration – expressed as the probability of completing the outage in ‘x’ days
In addition to determining the CP sequence of steps, the team identified all the near-CP work, in this case, all work sequences that are expected to be completed within 10% of the CP duration.
Throughout this important work, several optimization undertakings were completed, using SMED or equivalent techniques.
Outcome of Planning and Scheduling Work
In the end a final schedule is produced, showing the optimal sequence of all activities, as well as the definitive expected outage duration.
Using the CP methodology, the expectation is that there is a single series of logically connected activities that dominantly determines the overall duration.
As in our case, readers that are familiar with the petroleum refining or petrochemical industries will recognize that the CP duration normally ‘travels’ through large process fixed equipment (distillation towers, reactors, separator drums, etc.) or major machinery trains, e.g. large compressors.
In this particular situation, the planning team suspected there could be more …
In addition to the significant and dominant work during the outage, there is typically a fairly large quantity of work of similar equipment type using the same worker trade. We called these bulk-type work.
Examples in continuous process industries include piping, insulation, electrical, instrumentation & controls.
In almost all cases, none of these activities either approach the CP duration or are even included in the CP sequence of work. They are typically scheduled as follows:
- Availability of equipment to be worked on, that is process circuits based on decommissioning or recommissioning logical sequence, and,
- Worker assignment to individual jobs. As an efficiency step, the planning team tries to keep the number of workers for each trade, the same for most of the outage’s duration.
The outcome, and again typically, is that individual bulk work activities are scheduled throughout the outage duration.
The planning team suggested an additional step: see if the bulk work as a whole could impact the expected duration due to the shear amount involved. They referred to this as a Critical Mass (CM) duration test.
Case Study of Critical Mass Duration Test
This case study involves the planned outage of a petroleum refining processing unit, primarily for inspection and repairs to equipment not normally available during operations.
In this particular situation, the corrosive and extremely hazardous nature of products being used results in an unusually large quantity of piping and isolation valve repairs or replacements.
None of this piping or valve work individually approached CP duration and, again alone, each job was a small fraction of overall outage effort (expressed in work-hours). On the other hand, the planning team suspected that, as a whole, the piping and valve work might be difficult if not impossible to schedule within the calculated CP duration. (In this case, the CP duration was determined to mostly include the work at a large distillation tower.)
The planners also considered the following:
- Historically, a not-insignificant amount of discovery or found work gets added to the outage scope, that is, piping or valve failures or near-failures identified after the processing unit is decommissioned and detailed inspections are carried out.
- A careful look at the actual work performance for this unit’s previous maintenance outage showed that the piping and valve work was completed only a few days less than the work on the CP path.
For this upcoming event, the amount of piping and valve work was noted as significantly more than previous outages. For all these reasons, the planning team needed a way to ‘test’ if the piping and valve work would come close to the CP outage durations.
The team was fortunate enough to have actual measured daily progress for this equipment class during this unit’s previous outage.
Daily progress is reported and recorded as earned planned hours during the outage, at the end of every work shift – including discovery / found work once it is planned, approved and added to the work schedules.
Using this data, they were able to calculate an expected daily work ‘productivity’ for piping and valve work, expressed as earned planned hours per day (equal to 2 work shifts).
The following observations were made:
- The histogram of daily earned planned hours clearly showed the expected ramp-up and ramp-down of work progress, with fairly constant daily progression in between. This is consistent with the worker assignments during the outage.
- Essentially no work progress for piping or valve work was made during the decommissioning and recommissioning activities. Since this portion of the outage can occasionally vary a lot, this time was excluded from the calculation.
- As a ‘reality-check’, the team also calculated the same piping and valve work productivity during the more recent outage of a different processing unit. The numbers were quite comparable.
Using the total planned work-hours for piping and valve work for the upcoming outage and the calculated productivity, the team could then estimate the expected duration for this work.
Total piping + valve planned work (‘000 work hours) ➗ Calculated daily productivity = expected duration, in days
We could then adjust for ramp-up and ramp-down and compare to the CP duration. In this case, the piping and valve work was determined to take a few days more than the CP duration.
A simplified sample calculation is provided in Figure 2:
Faced with this fact, the planning and execution teams applied a number of counter-measures to help ensure the predictability of the outage’s duration.
Given that none of the individual piping and valve jobs were on the CP duration sequence, there was significant flexibility in scheduling each job while recognizing other logical work constraints. Maximizing execution efficiency was now the primary factor.
The teams re-tested the repair vs replacement decisions. Where additional replacement was possible, this resulted in further reduction in work-hours during the outage, e.g., bias towards bolted vs. field welded connections. Even if additional lengths of piping were replaced, it was now economic to do so and reduce the duration, or improve the probability of meeting the planned duration.
Larger piping jobs were scheduled first, as a priority where possible.
Other piping and valve work were mostly scheduled based on process circuits or systems, essentially in the order they are decommissioned.
Evaluated worker density limit in the job plans and schedules number of workers in a specific area during a specific time period); allows adjusting the schedule for particular jobs to minimize this constraint, as least on a planned basis.
The final schedule made an allowance for discovery / found piping and valve work, based on prior data.
During execution, detailed monitoring was in place for both the larger piping and valve jobs, and by geographical area for all others. Dedicated resources were available to allow timely intervention to remove execution constraints as they are detected.
Prerequisites and Other Considerations when Testing for Critical Mass Duration
Obviously, good actual progress data for similar work is needed, ideally on the same equipment or area is required.
Some adjustments may be required:
Change in work schedules – this outage vs. previous one(s), e.g., number of shifts, work hours per shift.
Changes to work constraints or to significant execution tactics: work safety requirement; equipment preparation steps; staffing constraints; limited experience of execution team or workers (typically third-party contractor).
Previously published papers mention measuring CM progress during execution, to ensure this work is progressing at the expected pace.
Our approach looked at this work from a pro-active point-of-view, during the planning and scheduling phase, well before the field work began. The team was correct in assuming that the piping and valve work would drive the overall event duration.
Give the limited past data on piping and valve work (only 2 events), we were unable to determine with great certainty what are the limiting constraints preventing an increase in daily productivity.
Both prior events showed very close results in terms of actual earned planned hours per day, so we suspect there is some commonality to these constraints:
- Equipment preparation steps and safe work permitting (for piping circuits)
- Work execution constraints, such as only one crane can be placed to service multiple work fronts
- Worker density, such as tiered work, that is, adding more workers would not increase daily productivity due to limited physical access to work front
While all ‘traditional’ planning and scheduling best-practices for planned outages need to be followed and continuously improved, to avoid schedule surprises planning teams should consider testing for critical mass duration whenever significant bulk-type work is expected.