FMEA ranks wastewater asset failures by severity, occurrence, and detection. The output is a prioritised maintenance plan grounded in risk, not in habit. A plant with 4,000 distinct asset items cannot maintain everything at the same intensity; FMEA is how you decide what gets the attention.
What FMEA actually is
Failure Mode and Effects Analysis is a structured walk through every way a piece of equipment can fail, what happens when it does, and what is currently in place to catch it. It originated in 1940s aerospace, was formalised by the US automotive industry in the 1980s, and crossed into water utilities through the AWWA M77 manual and the EU's structured asset management codes.
The deliverable is a worksheet with one row per failure mode and a numerical priority score for each. The score, not the row, is the point: it tells the maintenance planner where to spend the next available hour.
FMEA is not the same as root cause analysis. RCA looks backward at one failure that already happened. FMEA looks forward at every failure that could happen. The two reinforce each other — a serious RCA finding usually triggers an FMEA update, and a high-RPN row in the FMEA tells you which RCAs are coming next.
The three scores
Each failure mode is scored on three 1-to-10 scales, and the three numbers are multiplied to give the Risk Priority Number (RPN). Maximum RPN is 1,000; in practice, most rows land between 24 and 200.
- Severity (S) — what happens if the failure occurs. 1 = nuisance, 10 = catastrophic regulatory or safety event. A bearing failure on a non-critical mixer is S=3; a clarifier drive seizing during peak flow is S=9.
- Occurrence (O) — how often this mode is expected. 1 = once in 30 years, 10 = monthly. Driven by manufacturer MTBF where you have it, by operator memory where you do not.
- Detection (D) — how likely current monitoring is to catch the failure before it bites. 1 = vibration sensor with auto-trip alarm, 10 = no monitoring, found only after wet floor.
The trick is internal consistency. Two facilitators from the same plant should land within ±10% on the same row. That only happens if you write down what each score level means for your plant before you start scoring — a one-page rubric pinned to the wall is enough.
💡 Anchor the rubric in regulatory consequences
For Severity at a wastewater plant, the cleanest 1-to-10 anchor is permit impact: 1-3 = no regulatory consequence, 4-6 = exceedance reportable as a single event, 7-9 = sustained non-compliance, 10 = bypass or release with public-health implications.
Worked example: lift station pump
Consider a 75 kW dry-pit centrifugal pump at a medium-size sewage lift station. Three plausible failure modes:
- Bearing failure — gradual onset, vibration-detectable. S=8 (loss of pumping capacity, possible CSO trigger), O=4 (every 4–7 years on this pump class), D=3 (continuous vibration monitoring catches it). RPN = 96.
- Mechanical seal failure — sudden, catastrophic. S=9, O=3, D=7 (no continuous monitoring on this asset). RPN = 189.
- Impeller rag-up — sudden but recoverable. S=5 (rated drop, alarmable), O=7 (modern wipes, frequent rags), D=2 (current draw signature easy to alarm). RPN = 70.
The Pareto is unmistakable: seal failure is the priority, not the bearing. The current PM regime probably greases the bearing every 3,000 hours and inspects the seal annually. The FMEA tells you to flip that — add seal monitoring, accept that the bearing is already well-controlled.
Building the FMEA worksheet
A working FMEA needs five disciplines:
- Tight scope — one process area or one asset class per session. The whole plant in one workshop is the most common failure mode of the FMEA itself.
- The right room — the lead operator, the lead mechanical fitter, the planner, and one engineer. Four people, half a day per session, eight or nine sessions for a typical works.
- A facilitator who is not the plant manager — score inflation is the constant risk; a manager facilitating their own assets cannot help nudging the numbers up to justify the budget ask.
- Standard column set — function, failure mode, effect, current controls, S, O, D, RPN, recommended action, owner, due date, residual S/O/D after action.
- A live worksheet, not a one-off study — RPN drops when the action lands. The worksheet earns its keep over years of revisits.
Turning RPN into a maintenance plan
Order the worksheet by RPN, descending. The familiar Pareto pattern almost always holds: the top 20% of rows account for 80% of the available risk reduction. The maintenance plan that follows from the FMEA is not "do everything more"; it is "concentrate where the score earns it."
Each high-RPN row converts into one or more actions. Typical patterns:
- RPN above 200 — engineer a design change, add continuous monitoring, or shorten the inspection interval until it falls below 100.
- RPN 100–200 — add a condition-based trigger or tighten the existing PM. The output here is exactly the input that predictive maintenance needs to set thresholds.
- RPN below 100 — keep the current PM, leave the row in the worksheet, revisit in 12 months.
FMEA in your CMMS
The FMEA worksheet should live inside the CMMS, not in a spreadsheet on a planner's desktop. At minimum, the asset record should carry the failure mode, the current S/O/D scores, and the RPN. Every PM and condition-based rule should reference the failure mode it controls. When the asset is decommissioned, the FMEA row archives with it.
This is one of the questions worth asking during procurement — see the vendor question list. A CMMS that cannot store an RPN field, or that hides the failure-mode taxonomy two screens deep, will quietly push your team back to spreadsheets within the first year.
Common pitfalls
The four ways an FMEA programme goes sideways are predictable and avoidable:
- Gold-plating — the team scores 600 failure modes for the headworks alone and runs out of energy before the digesters. Pick the top 10 critical assets first; expand later.
- Score inflation — every row gets S=8 because "anything in this plant is serious." A flat distribution is useless. Force the team to leave room at the top of the scale for the genuinely catastrophic rows.
- One-off study — a consultancy delivers a perfect FMEA in a binder, the binder goes in a drawer, nothing changes. The FMEA is only as live as its last revisit; quarterly is the rhythm that works.
- No RCA feedback loop — every real failure should update the worksheet (raise the O score, reduce the D score on whatever caught or missed it). Without that loop, the FMEA stops reflecting reality after about 18 months.
⚠ Reality check
An FMEA workshop that produces 200 rows and zero actions is a pleasant team-building exercise, not a maintenance programme. Every FMEA session should close with at least three concrete work orders raised in the CMMS before the room empties.