Pillar guide·Data

Open data in wastewater: what is available and how to use it

EPA ECHO, EU UWWTD reporting, regional water authority datasets - what each one publishes, where to download, and how to combine them for plant-level analysis.

UtilityRadar Team May 9, 2026 7 min read

Multiple national and supranational datasets publish wastewater plant data for free. Knowing where to look saves weeks of FOIA requests — and gives you a baseline that is comparable across jurisdictions.

EPA ECHO (US)

The US EPA's Enforcement and Compliance History Online (ECHO) is the deepest open dataset on the operating side. Every NPDES-permitted facility is in there: facility records, permit limits, discharge monitoring report (DMR) data, inspection history, enforcement actions, and the full text of consent decrees.

Coverage runs to roughly 16,500 major and 40,000+ minor wastewater dischargers, including plants down to a few hundred PE. Data is updated monthly. The query interface offers a maps front-end, a parameter-search front-end, and a bulk download path. The Loading Tool (a sub-product) lets you pull DMR data normalised across facilities and parameters.

Quirks worth knowing: DMR data is reported as it was originally entered by the facility, with all the inconsistency that implies. Limit exceedance flagging is automated and occasionally wrong. NPDES code reuse across permit cycles can confuse longitudinal queries unless you account for it.

EU UWWTD reporting

Every five years EU Member States submit plant-level data to the European Environment Agency under Article 16 of the Urban Waste Water Treatment Directive. The current public dataset covers more than 23,000 agglomerations and their associated treatment plants, with population equivalent, treatment level, capacity, compliance status, and discharge characteristics.

The CSV downloads sit on the EEA's data hub. The 2024 directive revision substantially tightens reporting cadence — moving from quinquennial to annual for many fields by 2027 — but the underlying schema continues. The data is the single best source for comparable European plant-level information.

The catch: reporting lag is significant. The latest published full reporting cycle is typically two to three years behind real time. Compliance flags reflect the year reported, not necessarily the current state.

UK Discharge Consents

UK data is fragmented across regulators but all of it is open. The Environment Agency (England) publishes:

  • Water company performance reports (annual) covering compliance, pollution incidents, treatment works performance.
  • Event Duration Monitoring open data on storm overflow spills (annual, now near-comprehensive).
  • Permit register with consent conditions for individual discharges.

SEPA (Scotland), Natural Resources Wales, and Northern Ireland Environment Agency publish equivalents on slightly different cadences and schemas. UKWIR publishes industry research and benchmarking. None of them quite agree on field naming, which is a recurring frustration.

Regional water authority datasets

Below the national level sits a long tail of regulator and authority datasets. A non-exhaustive list of the most useful:

  • California Integrated Water Quality System (IWQS) — California-specific NPDES data, often more current than the federal mirror.
  • Germany — LfU (Bavaria), LANUV (NRW) and equivalents in other Länder publish detailed plant-level data, including biological monitoring, often with English summaries.
  • Australia — state EPAs (NSW EPA, EPA Victoria, DES Queensland) publish licence registers and compliance reports for wastewater dischargers.
  • Canada — federal NPRI for releases and the provincial regulator portals (Ontario, Quebec) for operational data.
  • Japan — MLIT publishes the National Sewerage Database with plant-level statistics for thousands of works.

Coverage and data quality vary by jurisdiction. Some publish monthly, some annually, some only when asked.

Combining datasets

The hard part of any cross-dataset analysis is reconciling identifiers. The keys that survive across datasets:

  • Latitude/longitude — almost universal, but reported precision varies from 6 decimal places (sub-metre) to 2 (kilometre-scale). Match within a tolerance, not by string equality.
  • NPDES permit number — US, persistent across the permit cycle. The closest thing to a stable identifier in US datasets.
  • EPRTR facility ID — EU, persistent across the European Pollutant Release and Transfer Register reporting cycle.
  • National ID schemes — UK CAR, German Stoff/Anlagen IDs, Japanese plant codes. Useful within country, useless across.

Unit reconciliation is the second pitfall. Influent flow may be reported in m³/day, ML/d, MGD (US), or population-equivalent depending on dataset. Total nitrogen may be reported as N or as the parent species. BOD may be 5-day or ultimate. Build a unit-conversion layer before you do anything else.

The UTC vs local time trap — DMR submission timestamps are local; ECHO ingestion timestamps are UTC; EU reporting uses local national time. Daily aggregates from poorly reconciled time stamps drift by up to one day at the boundary, which materially distorts any storm-event analysis.

💡 Build the join key once Anyone serious about cross-dataset wastewater analysis builds a master crosswalk of (lat/lon, NPDES code, EPRTR ID, internal key) early and updates it on a defined cadence. The marginal cost of the next analysis collapses once that table exists.

Caveats

A few things to keep in mind whenever you cite these datasets:

  • Reporting lag — even fast-cadence datasets like EPA ECHO carry a 3–6 month lag for DMR data. UWWTD data lags 12–24 months. EU national datasets fall in between.
  • Missing fields — capacity utilization, treatment level, and population served are missing for a substantial fraction of records, especially smaller plants. The "Not Reported" handling described in the treatment levels guide applies broadly.
  • Mismatched parameter definitions — total phosphorus measured by ICP differs from total phosphorus measured by colorimetric digestion in low-concentration tails. The reported field is the same name; the underlying number is not the same number.
  • Reporting threshold differences — the EU UWWTD threshold for plant-level reporting is 2,000 PE; the US NPDES Major threshold is 1 MGD design flow or pretreatment programmes; UK reporting varies by regulator. Cross-jurisdictional counts of "wastewater plants" are not directly comparable without normalisation.
  • Schema version drift — every regulator changes field names every few years. Build version-aware loaders or the analysis silently breaks the next reporting cycle.

How UtilityRadar uses these

The UtilityRadar wastewater directory consolidates most of the above into a single normalised view. Records are joined on the lat/lon and national-ID crosswalk discussed above, with treatment level harmonised to the four-bucket scheme (Primary, Secondary, Advanced, Not Reported), capacity normalised to m³/day, and population served reconciled where the underlying datasets disagree.

The directory does not replace the source datasets. It surfaces the comparable subset — the fields that survive the join — and links back to the original record for full detail. For deeper sub-views see the primary plants, secondary plants, and advanced plants filters.

For plant-side context that frames how to read the operational fields in any of these datasets, see the related guides on capacity utilization, sludge management, and climate resilience.

UtilityRadar
More
Press Esc to close · Advanced search