An important aspect of understanding and addressing climate change involves data governance - the cost-effective management of climate data assets through a structured system of policies, roles, processes and controls to ensure availability, usability, and integrity. Climate-related data is collected in the U.S. by the Environmental Protection Agency (EPA) and by some state environmental regulatory agencies. While much of the data is publicly accessible, it comes with the caveat that the collection is a hodgepodge of URLs, file formats and custom-built applications requiring different tools and/or specialized knowledge to be useful to citizens, investors, researchers, and governments. Each government agency, state or federal, maintains their own data standard and often more than one even within a single agency for different datasets of similar data.
A data standard is the blueprint or logical structure that defines how data is organized. Generally one standard is not “better” than another but when regulators collect data in different standards, each dataset is structured differently, akin to a different “language”. Climate impact however, is not limited to the impact we have on our own state, or even our own country. We need to speak the same language to understand what’s happening elsewhere that will impact us.
Environmental regulators could improve the accessibility and usability of climate datasets by establishing a semantic data model through a taxonomy or ontology agreed upon by state and federal regulators, whereby reported data could be easily catalogued and shared. By collecting, or converting it on submission, into structured, standardized format following a single semantic data model (schema), regulators could continue to maintain their own datasets and process for data collection, but all regulatory datasets would be interoperable because they would be identically structured.
Interoperability means that regulators can share data and tools for querying, extraction and analysis (thus reducing the cost of building and maintaining applications), and can perform more robust analysis. Analyzing information from thousands of entities requires the same effort and cost as analyzing information from one entity when datasets are interoperable.
This paper describes some of the datasets currently available from the EPA and certain state environmental regulators, and explains how collecting and storing the data using a semantic data model could benefit regulators, reporting entities, and data users.

