Data

Open data in wastewater: what is available and how to use it

A curated map of open wastewater datasets that matter. US ECHO, EU UWWTD, UK Environment Agency, and third party aggregators, with quality and cadence notes.

UtilityRadar Team

Data

May 9, 2026 9 min read

Multiple national and supranational datasets publish wastewater plant data for free. Knowing where to look saves weeks of FOIA requests and gives you a baseline that is comparable across jurisdictions.

This guide is a curated map of the open wastewater datasets that actually matter, with commentary on quality, coverage, refresh cadence, and how to combine them. If you are a researcher, a consulting engineer, a journalist, a compliance officer benchmarking peers, or a utility planner, everything below is public and free to use.

Why open data on wastewater is worth knowing

Every developed jurisdiction now publishes some form of wastewater plant registry. Coverage, format, and refresh cadence vary wildly, but the direction of travel is toward more openness, higher granularity, and near real time updates. For anyone whose work depends on knowing where plants are, how much they treat, how well they perform, or how they compare to peers, open data is the fastest path.

The open data landscape has three layers: government mandated public data, government voluntary open data initiatives, and third party aggregations. Each has different quality and coverage characteristics.

United States: what is available

Dataset	Coverage	Refresh	Use
EPA ECHO	All permitted dischargers (industrial + municipal)	Weekly	Compliance status, effluent violations
NPDES permit finder	NPDES permits	Monthly	Permit conditions, limits, monitoring plans
EPA CWNS	Clean Watersheds Needs Survey plants	Every 4 years	Capacity, service population, treatment level
USGS NHD	National Hydrography Dataset	Continuous	Receiving water, watershed context
State agency portals	State specific permits and reports	Varies	DMR data, enforcement actions

ECHO is the flagship. It pulls compliance data from state and federal sources, publishes weekly, and offers an API. The NPDES permit finder holds the permits themselves. CWNS is the four year strategic snapshot with capacity, service population, and treatment level data. USGS NHD provides the hydrographic context for downstream analysis.

European Union: what is available

Dataset	Coverage	Refresh
UWWTD	All UWWTD reported plants across EU member states	Every 2 years
WISE Freshwater	Water Framework Directive reporting	Every 6 years (planning cycle)
National environment agencies	Member state specific	Varies (annual to real time)

The Urban Waste Water Treatment Directive (UWWTD) database is the EU wide baseline. Every plant serving more than 2,000 population equivalents (PE) reports load, treatment level, and compliance status. Coverage is very high; refresh is slower than US ECHO. WISE Freshwater adds Water Framework Directive context for receiving waters.

United Kingdom: what is available

Dataset	Coverage	Refresh
Environment Agency Storm Overflow performance	England CSO event and duration monitoring	Annual with quarterly updates
Consumer Council for Water	Company level compliance and complaints	Annual
Ofwat performance data	Company level financial and operational	Annual
DEFRA discharge permits	Environmental permit register	Continuous

The UK is a leader in storm overflow transparency. Event duration monitors on virtually every CSO discharge point, published in annual company reports and the Environment Agency dataset. Ofwat publishes company level performance data useful for benchmarking.

Other regions

Canada, Australia, New Zealand, and Japan publish varying levels of wastewater data. Coverage is generally better at the national or state level than at the utility level. In developing regions, open datasets are patchy but improving. The World Bank open data portal aggregates national level water and sanitation indicators globally.

Third party aggregations

Several projects aggregate open data across jurisdictions for comparability. Notable examples include the Utility Radar directory itself (the resource you are reading), OpenStreetMap infrastructure tags, and academic research databases published by water research institutions.

These aggregations add value by cross referencing multiple sources, standardising terminology (a plant with "primary and secondary treatment" in one dataset should map cleanly to another dataset "activated sludge secondary"), and geocoding to consistent coordinate systems.

Data quality: what to watch for

Common trap. Aggregated datasets often carry stale data. A plant registry that was accurate in 2018 may not reflect the 2024 upgrade to tertiary treatment. Always check the source refresh date before drawing conclusions.

Refresh cadence. Weekly datasets are near current, four year datasets are historical. Choose accordingly.
Definitional consistency. "Design capacity" means slightly different things in US CWNS, EU UWWTD, and UK Ofwat data. Read the metadata.
Coverage gaps. Small plants (below 2,000 PE in EU, below 1 MGD in US) may or may not be included. Check.
Reporting lag. Real time data is rare. Most datasets lag by 3 to 12 months.
Enforcement bias. Compliance data reports what was measured, not what happened. Absence of a violation is not proof of clean operation.
Geographic accuracy. Some datasets place plants at the utility service centre rather than the plant location. Look for a separate lat lon field.

Combining datasets

The real analytical power comes from combining datasets. Some worked examples:

Question	Combination needed
Which US plants above 10 MGD have received a formal enforcement action in the past 3 years?	ECHO enforcement + CWNS capacity
What percentage of EU wastewater flow receives at least secondary treatment?	UWWTD reporting
Which UK plants have discharged over 500 hours of storm overflow in the past year?	Environment Agency storm overflow performance
What is the flow weighted average nitrogen concentration by watershed?	ECHO effluent monitoring + NHD watershed boundaries
Which utilities serve populations over 500,000 without tertiary treatment?	CWNS capacity + service population + treatment level

Programmatic access

Most major open datasets offer REST APIs. ECHO offers a well documented REST API with permitted burst limits. The UWWTD provides bulk download and query interfaces. Third party aggregators frequently provide their own API with unified schema.

Rate limits typically allow 1,000 to 10,000 requests per day free tier. Bulk academic access can usually be arranged with a specific research proposal.

Licensing

Most government open datasets are public domain or use permissive open licences (CC0, CC BY). Some third party aggregations use share alike licences that require attribution. Read the licence before commercial use.

Where open data is going

Near real time compliance monitoring is expanding. UK storm overflow monitoring at 15 minute resolution is now the norm. US ECHO is moving toward faster refresh. EU member states are pilot testing real time water quality dashboards. The next decade will see event level data available near live, transforming what is possible for research, journalism, and community accountability.

Tooling for open data analysis

Analytical tooling for open wastewater data ranges from spreadsheets to purpose built platforms. For occasional analysis, Excel or Google Sheets handle most public datasets, especially the smaller ones (compliance summaries, permit lists). For serious analytical work, Python or R with pandas or dplyr provides the flexibility to combine, transform, and visualise datasets. Geographic analysis benefits from QGIS (free) or ArcGIS (commercial) with plant point layers and receiving water polygons. For dashboards and public reporting, Tableau, Power BI, and open source alternatives like Metabase work well against local data extracts.

Reading dataset metadata

Every serious open dataset carries metadata describing its coverage, definitions, refresh schedule, and known limitations. Ignoring metadata is where analytical mistakes get born. Practical questions to answer from metadata before analysis: what is the collection date versus publication date, what does each field mean in operational terms, what units are used (some datasets mix mg/L and kg/day without clear labels), what geographic reference system defines the coordinates, what enforcement actions or reporting rules affect the observed values.

Cross verification with primary sources

For any finding that will be published, cited, or used for a policy decision, cross verify with a primary source. If ECHO shows a plant has three effluent violations, the underlying DMR reports are the primary evidence. If UWWTD shows a plant achieved secondary treatment level, the national environment agency report is the primary evidence. Aggregators are for discovery and analysis; primary sources are for verification. Errors in aggregators (delayed refresh, coding mistakes, definitional shifts) are common enough that cross verification is not optional for consequential findings.

Practical use cases

Utility benchmarking against peers on operational metrics.
Regulatory research on enforcement trends.
Water quality research at watershed scale.
Investment due diligence for water sector investors.
Community accountability tracking for local plants.
Consulting engineering baselines for design work.
Journalism on water pollution and compliance.
Emergency response and risk mapping.

Key insight. The most useful analytical results usually come from combining two or three datasets, not from any single one. A permit register plus enforcement data plus watershed boundaries produces insight that none of them provide alone.

Citing open data properly

Citations matter for reproducibility and for credit to the data producers. A good citation includes the dataset name, publisher, version or access date, and URL. Most government datasets support this straightforwardly; some aggregators are less clear about versioning. When publishing analysis, cite the underlying primary source, the aggregator if you used one, and the date you extracted the data. This lets a reader or a future analyst reproduce the finding even after datasets refresh and change. Academic publishing conventions on data citation are now well established through the Force 11 data citation principles and similar frameworks.

Collaboration with utilities

Some analytical questions benefit from utility collaboration on the underlying data. Utilities often hold higher resolution data than they publish (15 minute rather than daily monitoring, individual sample results rather than monthly averages). Academic researchers, consulting engineers, and journalists can often obtain higher resolution data by contacting the utility directly, especially when the analytical question aligns with the utility own interest. Explicit permission to publish should be secured before using non public data, and confidentiality boundaries respected.

Privacy and sensitivity

Wastewater data is generally not privacy sensitive at the plant level. Individual customer data (billing, consumption) is separate and privacy protected. Some critical infrastructure protection regimes restrict very detailed plant layout data, and the US Bioterrorism Preparedness legislation covers drinking water infrastructure risk data. Wastewater plant location, capacity, and effluent quality are almost always public.

Practical data quality heuristics

Experienced open data users apply a small set of heuristics that catch most quality issues. First, any parameter value more than 5 standard deviations from the plant historical median is probably a data error, not a real reading. Second, if a plant reports a monthly average without at least 4 daily values, treat it as a small sample estimate. Third, if the geographic coordinate places a plant at a utility service centre address, do not use it for watershed analysis. Fourth, if a compliance rate is reported at exactly 100 percent for years in a row, the reporting is a summary not a live measurement.

Frequently asked questions

Is ECHO free to use commercially?

Yes. ECHO data is US federal public domain.

How current is UWWTD data?

Every 2 years is the mandated reporting cadence. Data can lag 3 to 12 months after the reporting year.

Can I get real time discharge data for a specific plant?

In some jurisdictions, yes. UK storm overflow monitoring is close to real time. Most jurisdictions publish monthly at best.

What about drinking water datasets?

Different regime. Look at EPA SDWIS in the US, Drinking Water Inspectorate in the UK, and equivalent programmes elsewhere.

How do I identify a specific plant across datasets?

By NPDES permit number in US, by UWWTD ID in EU, by permit reference in UK. Cross referencing is easier when the plant has a clear geographic reference.

Can I use these datasets for academic research?

Yes. Most academic use is straightforward under the licences.

Are treatment capacity numbers reliable?

Design capacity numbers are reliable but do not always reflect current usable capacity. See our companion article on capacity utilization for the distinction.

What about industrial pretreatment data?

US ECHO includes industrial dischargers with individual permits. Aggregated pretreatment data at the utility level is patchier.

Do the datasets include planned or under construction plants?

Rarely. Datasets typically include operational plants only. Planned expansion appears in CWNS or in permit issuance records.

How do I find datasets for a specific country?

Start with the national environment agency. Follow with the state or regional water authority. World Bank has a country level indicator layer for comparability.

Summary

The open wastewater data landscape is richer than it looks and improving fast. US ECHO and the EU UWWTD are the flagships. UK and other national datasets add depth in specific dimensions. Third party aggregations add cross jurisdiction comparability. The real analytical value comes from combining datasets, understanding their quality and cadence, and asking questions that no single dataset can answer alone. For anyone whose work depends on wastewater data, open sources should be the first stop.

Next reading

See the assets in this article

Explore 177,000+ utility infrastructure sites

Locations, capacity, operators, and permits across 24 sectors: the same records our writers pull from.

Start browsing

Written by

UtilityRadar Team

Data guides from the UtilityRadar team.