EPA ECHO, EU UWWTD reporting, regional water authority datasets - what each one publishes, where to download, and how to combine them for plant-level analysis.
Multiple national and supranational datasets publish wastewater plant data for free. Knowing where to look saves weeks of FOIA requests — and gives you a baseline that is comparable across jurisdictions.
The US EPA's Enforcement and Compliance History Online (ECHO) is the deepest open dataset on the operating side. Every NPDES-permitted facility is in there: facility records, permit limits, discharge monitoring report (DMR) data, inspection history, enforcement actions, and the full text of consent decrees.
Coverage runs to roughly 16,500 major and 40,000+ minor wastewater dischargers, including plants down to a few hundred PE. Data is updated monthly. The query interface offers a maps front-end, a parameter-search front-end, and a bulk download path. The Loading Tool (a sub-product) lets you pull DMR data normalised across facilities and parameters.
Quirks worth knowing: DMR data is reported as it was originally entered by the facility, with all the inconsistency that implies. Limit exceedance flagging is automated and occasionally wrong. NPDES code reuse across permit cycles can confuse longitudinal queries unless you account for it.
Every five years EU Member States submit plant-level data to the European Environment Agency under Article 16 of the Urban Waste Water Treatment Directive. The current public dataset covers more than 23,000 agglomerations and their associated treatment plants, with population equivalent, treatment level, capacity, compliance status, and discharge characteristics.
The CSV downloads sit on the EEA's data hub. The 2024 directive revision substantially tightens reporting cadence — moving from quinquennial to annual for many fields by 2027 — but the underlying schema continues. The data is the single best source for comparable European plant-level information.
The catch: reporting lag is significant. The latest published full reporting cycle is typically two to three years behind real time. Compliance flags reflect the year reported, not necessarily the current state.
UK data is fragmented across regulators but all of it is open. The Environment Agency (England) publishes:
SEPA (Scotland), Natural Resources Wales, and Northern Ireland Environment Agency publish equivalents on slightly different cadences and schemas. UKWIR publishes industry research and benchmarking. None of them quite agree on field naming, which is a recurring frustration.
Below the national level sits a long tail of regulator and authority datasets. A non-exhaustive list of the most useful:
Coverage and data quality vary by jurisdiction. Some publish monthly, some annually, some only when asked.
The hard part of any cross-dataset analysis is reconciling identifiers. The keys that survive across datasets:
Unit reconciliation is the second pitfall. Influent flow may be reported in m³/day, ML/d, MGD (US), or population-equivalent depending on dataset. Total nitrogen may be reported as N or as the parent species. BOD may be 5-day or ultimate. Build a unit-conversion layer before you do anything else.
The UTC vs local time trap — DMR submission timestamps are local; ECHO ingestion timestamps are UTC; EU reporting uses local national time. Daily aggregates from poorly reconciled time stamps drift by up to one day at the boundary, which materially distorts any storm-event analysis.
A few things to keep in mind whenever you cite these datasets:
The UtilityRadar wastewater directory consolidates most of the above into a single normalised view. Records are joined on the lat/lon and national-ID crosswalk discussed above, with treatment level harmonised to the four-bucket scheme (Primary, Secondary, Advanced, Not Reported), capacity normalised to m³/day, and population served reconciled where the underlying datasets disagree.
The directory does not replace the source datasets. It surfaces the comparable subset — the fields that survive the join — and links back to the original record for full detail. For deeper sub-views see the primary plants, secondary plants, and advanced plants filters.
For plant-side context that frames how to read the operational fields in any of these datasets, see the related guides on capacity utilization, sludge management, and climate resilience.