Improve Turnaround, Shutdown, and Outage Duration: After Action Review

The practice of After Action Review, AAR, comes from the military, as in the US Army’s TC25-20 “A Leader’s Guide to After-Action Reviews” 9/93.  The approach is a classic example of Plan/Do/Check/Act and after an activity, while events are still fresh we ask five questions as follows:

  1. What was the plan?
  2. What actually happened?
  3. What went well?  So we can be sure to do it again.
  4. What went wrong?  So we can figure out how to do better.
  5. What are we going to do now?

Improve Turnaround Outage Duration: Command Center

“Command Center” brings up images of NASA or maybe a natural disaster response team.  For a large planned maintenance turnaround the furniture and technology may be different but the concepts are much the same. The command center is the place where the outage leader directs the resources of the turnaround.  To do this successfully:

  • The Plan is available for all, and is easy to understand
  • Status of all work is constantly up to date and where exceptions, deviations stand out
  • Missing or stale information is obvious, as is who is responsible
  • Information ownership, source, and ‘freshness’ is easy to see
  • Deviations have pre-planned countermeasures clearly displayed
  • Information gets updated before or after not during meetings
  • When the critical path inevitably changes a new plan is in place in minutes not hours

Improve Turnaround Outage Duration: Scope Change Management

Uncontrolled scope change is unplanned work and this is a failure of either planning, reliability engineering, or management. Scope Change Management countermeasures include:

  • Scope freeze, scope change cutoff
  • Single point of authority to add/drop/change scope
  • Risk-based decision making on ‘found’ or ‘discovered’ work
  • Cost and duration offsets
  • Post shutdown root cause analysis and corrective action

So, when scope change happens we celebrate because this is an opportunity to learn and improve.

For part shortages, we pre-stage and confirm parts be fore needed.

For ‘found work’, we freeze the scope.

Over runs and delays can happen, and so the turnaround plan has a single delay buffer.

 

Improve Turnaround Outage Duration: Milestone Reporting

That’s a “GT” on the milestone, as in 10 miles north of Grahamstown South Africa.  Milestones were originally stone markers used by Roman road builders as a series of numbered markers provide reference points along the road.  Milestones can be used to reassure travelers that the proper path is being followed, and to indicate either distance traveled or the remaining distance to a destination.

Funny story – on Nantucket Island, located off the coast of Cape Cod in Massachusetts, USA the road from town to Siasconset is named the Milestone Road.  They have a Pi Milestone at 3.14 miles from town.

In a plant maintenance turnaround keeping track of the outage events and issues is a vital way for the organization to learn where they are and where they are going.  Its almost impossible to reconstruct the battle after the fact; memories fade, rationalizations creep in.  If you aren’t paying attention to the road markers and something happens how do you know where you are?

When all you measure is when the shutdown started and when it ended all you know is the total lost production time.  When key steps along the way are planned and recorded you can learn where the time goes, and then take steps to attack the biggest deviations.

Improve Turnaround Outage Duration: Non Stop Critical Path

Maintenance outages, turnarounds, or plant shutdowns are complex, and can involve hundreds or even thousands of temporary workers, and are very costly.  In our work we have been successful applying non stop critical path analysis in helping reduce large facility outage duration.  The immediate benefits are increased process availability and corresponding revenue and profit.  Typical outages are planned and scheduled months in advance and last anywhere from a week to a month or more.  Outage duration tactics include streamlining processes such as the shutdown and startup processes, improving the decision making and communications methods, and controlling scope.  Another tactic is externalizing tasks, which is by doing work before or after the process goes down, tasks such as pre staging tools and materials, preparing the work site.  We’ve also been working to better understand the nature of planning and executing Critical Path work.

We believe that a shorter critical path means an overall shorter outage.  To get a shorter critical path you often have to add additional resources which increases turnaround costs.  What we want to better understand are the conditions where a shorter outage actually costs less.  Typical contractor schedules are 2 ten hour shifts per day.  Productivity is believed to drop off significantly after 10 hours due to additional breaks and fatigue.  We have applied longer shifts to the critical path and have seen overall duration reduction.  We have also had mixed results with the concept of Non Stop Critical Path, also referred to as “tool to tool”.  Some types of work are machine rate based, for example tasks such as pipe welding or sand blasting.  For this type of work the value is in the “wrench or tool time”.  The concept is that the tool never stops.  The welder stays on the tool until someone comes, taps the worker on the shoulder, takes the welding rod and keeps on welding.   While working spotters make sure the value adding worker never has to go searching for weld rod; an assistant keeps the worker supplied.  The tool never takes a lunch break, or goes on treasure hunts; the tool works as close to 24 hours as possible.  To do this we have spotters filling in for the workers when they take their breaks, and stagger crew start times.

Shutdowns can have a range of % critical path work.  Sometimes the critical path is almost the entire outage work scope; in other cases the critical path work can be as little as 5% of the overall outage effort.  We believe that when the critical path is narrow, or a small portion of the overall work, then applying additional labor to the critical path can greatly reduce the overall cost and duration of the outage.

Staffing Non Stop Critical Path of course takes more resource and is a higher operating expense then the normal 2 ten hour shifts.  Non Stop Critical Path provides not only an additional 4 hours of shift time compared to 2 Tens, it also adds back two lunches, 6 breaks, and 2 sets of getting to and from the work face, or some 7 or more hours of tool time per day.

The combination of shift patterns, over time, mix of critical path work, equipment rentals, etc. makes it difficult to apply a general rule for all turnarounds as to whether extending shift length or adding labor to the critical path increases or decreases overall outage expenses.  So a model is needed to answer the question of under what conditions does applying extra labor to get a shorter outage actually cost less.

Improve Turnaround Outage Duration: Parallel Planning

 

Outage readiness is better when all the right people give and receive input at the earliest effective date; bust the silos, jump the walls, and use parallel planning …

  • Everyone shares the goal of being ready to go fast (and safely) when the outage begins, so we’ll benefit from bringing the right people together to ensure we have the right inputs and everyone is working from the same plan
  • Close to the shutdown date many key decisions are made and people from a lot of groups are preparing – they all need to be in sync
  • Hear one message and one answer to key questions, and those with input have a channel for sharing

 

What’s different with parallel planning?

  • Draft schedule available earlier; honest, because when developed in parallel there’s less rework
  • Earlier involvement of key contractors learning about plans and schedules and giving input on what will work
  • Greater turnaround leader and scheduler interaction
  • Earlier involvement and input from all support groups
  • Wider understanding of preparation and externalization efforts
  • More focus on any scope changes that occur inside of the gate reviews due to greater exposure to the plan and schedule

Improve Turnaround Outage Duration: Shutdown & Startup

All too often the battle for managing the duration and cost of a planned plant maintenance shutdown is lost in the stages of shutting down or the starting up of the operation.  While a lot of planning effort usually goes into understanding and coordinating the maintenance work it is the unknowns that that pop up as the process is coming off line, or perhaps more often the surprises that occur when starting up that can throw the best made plans out the window.

Checklists and SOP’s are always good things to have.

Here are a few more things to have on your Startup to-do list …
  1. As Built plant configuration
  2. Post maintenance inspection of equipment condition
  3. Verify and validate maintenance work completion
  4. Visual and physical checks for leak and pressure
  5. Instrument performance and controller checks
  6. Safety and relief valve checks
  7. Lockout-tagout de-blinding activities
  8. Utility (steam, air, power, fuel, refrigerant, solvent) availability checks
  9. Commissioning of units as per Standard Operating Procedure
  10. No load testing of pumps, motors, compressors, turbines
  11. Start-up – cold circulation
  12. Warming up

But even with all of these precautions things can still go wrong, and when they do the clock keeps running.  So what to do to reduce risks?  How about planning for the unexpected?

  • What-if and Failure Modes & Effects Analysis
  • Checklists and planned countermeasures
  • Dry runs & simulations
  • Postmortem and After Action Review – to capture learnings and drive continuous improvement

Improve Turnaround Outage Duration: Constraint Busting

Scope and Gantt scrubs, reliability engineering equipment improvement  projects, and new technologies are steps we can take to extend the life of equipment or to mitigate obstacles to minimizing outage duration.  Some examples of constraint busting include:

  • Deferring work based on observation, inspection, and risk assessment
  • Installing new man doors can save time by opening up more work faces (parallel work) or making installation easier for utility or handling access (air lines, electrical cables, cranes for lifting)
  • A great engineering project example is the practice of swapping equipment rather than repairing in place
  • Working non-stop on the critical path, although not easy to do, can also have a big impact on outage duration and costs

 

The methodology we apply is Goldratt’s classic Theory of Constraints:

  1. Identify the Constraint – find the critical path and focus on it
  2. Decide how to Exploit the constraint – optimize resources
  3. Subordinate everything else – critical path gets top priority
  4. Elevate the constraint – open new work faces, overlap shifts
  5. Repeat – start over

Improve Turnaround Outage Duration: Externalize

Externalize: when looking at all of the tasks that occur leading up to, during, and after a planned maintenance shutdown it can be helpful to categorize each task as internal or external.  Internal tasks are those that can only be performed when the process is stopped, while External tasks can be done either before shutdown or after starting back up.

Here are examples of tasks often found happening during an outage that, some of or all, could have been done while the process was running:

  1. Repairs to work done during the outage
  2. Reinspecting someone’s work
  3. Waiting for parts, tools, work instructions
  4. Waiting for a ride to the work face
  5. Waiting for inspector, permit writer
  6. Building scaffolds
  7. Moving materials or tools to the point of use
  8. Treasure hunts, scavenger hunts
  9. Preparing reports, making presentations
  10. Rerunning ‘the schedule’

Being able to see tasks as internal or external is the first step toward reducing the planned shutdown duration.

Plant Shutdown Readiness

Shutdown readiness … You’ve heard the sayings “What gets measured gets better” and “Inspect what you expect”, well here’s a management control tool you can use to help drive the right behaviors leading up to and during your plant shutdown.  As a lean thinking leader take one of the cards, grab one of your people, and go for a walk. Then do something with what you find.

Turnaround Readiness Observation Cards

Download a copy of these checklists here.

Lean Plant Shutdown Strategies

Whether you call it a plant shutdown, outage, or turnaround getting the right work done, safely, and in the shortest time can be tricky.  Here are a few Lean Plant Shutdown Strategies we’ve taken that have helped make dramatic improvements in reducing planned and unplanned downtime:

Externalize – do nothing during the shutdown that can possibly be done while the process is running

  • SMED 101 – separate internal from external
  • Staging supplies – no ‘treasure hunts’
  • Prepare tool fixtures
  • Prepare work areas
  • Dry run, dress rehearsal, walk through, simulations
  • Checklists

 

Constraint Busting – find the constraint(s) and exploit/subordinate/elevate

  • Work scope scrubs – select work base on probability of failure and risk impact
  • Schedule scrubs – eliminate, combine, rearrange, simplify

 

Shutdown & Start Up – making the plant ready for maintenance work

  • Checklists
  • Dry run & simulations
  • Labor plans
  • Safety permits – lockout tag out efficiently
  • Parallel teams, chase the rabbit

 

Parallel Planning – bust organizational silos with concurrent cross functional teams

  • Work scope and schedule iterations and scrubs
  • Risk-based work selection
  • Contractor work reviews
  • Dry run dress rehearsals

 

Non Stop Critical Path – understand the trade offs when applying additional labor

  • Separate man and machine – machine based operations never stop (welding, blasting), man based operations suffer fatigue (demolition, fabricating)
  • Overlapping shifts
  • Relief crews
  • Runners, spotters, observers

 

Milestone Reporting – what gets measured get better

  • Categorize activities
  • Sequence prerequisites
  • plan vs actual
  • deviation, root cause, and countermeasures

 

Scope Change Management – unplanned work is a failure of planning or reliability engineering

  • Scope freeze, scope change cutoff
  • Single point of authority to add/drop/change scope
  • Risk-based decision making on ‘found’ or ‘discovered’ work
  • Cost and duration offsets
  • Post shutdown root cause analysis and corrective action

 

Command Center – transparency

  • The Plan is available for all, and easy to understand
  • Status of all work is easy and quick – exceptions, deviations stand out
  • Missing or stale information is obvious, as is who is responsible
  • Information ownership, source, and ‘freshness’ is easy to see
  • Deviations have countermeasures clearly displayed
  • Information updated before/after not during meetings
  • When the critical path change inevitably happens, new plan is in place in minutes not hours

 

Management Controls – inspect what you expect if you want to sustain

  • Preparation reviews
  • Gemba walks and paired observations
  • Site inspections
  • Checklist reviews
  • Performance metrics and countermeasures

 

After Action Review – no shutdown is flawless; learn and improve

  • What was the plan and what actually happened?
  • What went well?  What went wrong?
  • Separate common from special cause
  • Find solutions for common cause, buffer risks for special cause

Lean Shutdown Management

Shutdown, outage, turnaround, or whatever you call it can vary dramatically in effort, duration, and cost, for example:

  • Months off line and millions of dollars in contract labor for a turnaround in an oil refinery
  • Days or weeks for a chemical plant shutdown
  • Hours for a recurring changeovers in many process industries

burton-pit-crew-051409NASCAR is a good example of what can be accomplished through planning, scheduling and execution.  Major contributors to pit stop or shutdown performance include communication between production and maintenance and continuously working on improving the basics of planning and scheduling, execution and root cause problem solving.  In the 1950s, a good pit stop lasted 4 minutes.  If nothing had been done to improve these events in the years since (because everyone thought a four minute pit stop was good), we would still be watching them.  Interestingly, a NASCAR driver is in constant contact with the pit crew.  The driver doesn’t suddenly show up in the pit and complain about a problem with a right front tire, only to have the crew answer: “Let us go to the store and check on a spare tire.”  Unfortunately, this happens daily in most plants.  In NASCAR competition, there’s a strong motivation to win races; in our plants and facilities, there might be completely different factors driving motivation.

In addition to driving Planning and Scheduling to precision and excellence, NASCAR pit crews are continuously working on improving the basics.  This includes, among other things:

  • Analyzing problems and successes
  • Training 20 hours per week for 20 seconds of work on Sundays
  • Doing work right before doing it fast

Regardless of the length of a plant shutdown, the same principles apply in making these events more effective or leaner.

  • First and foremost, problem-free operation should be possible between scheduled shutdowns.  Mean time between production losses including quality, time, and production rate should be as long as possible.
  • Shutdowns should be performed with the right quality on all jobs, as quickly as possible.

The combination of how many shutdowns you have and how long they are affects both your production volume and your ability to deliver product on time.  It is a given that the shutdown must be scheduled (when and who executes what) and that all the jobs must be planned (what, how, all tools, spare parts and materials, lockout/tagout, etc.) before the shutdown begins.  In addition, all shutdowns should have a set time for freezing the schedule.  After the freezing point, no new jobs will be accepted without harsh criticism, and a corrective action plan.  Consider post freeze work requests to be an outage planning and reliability engineering process defect, and act accordingly.