March 19, 2025

|

Article

AI and Methane Emissions: Building on a Data Standard Foundation

AI’s Potential in Methane Management Starts with Standardized Data

By Dr Steve Liang, CTO and Founder, SensorUp; The Rogers Internet of Things Research Chair; Professor at the University of Calgary; Convener of the OGC Emission Event Modeling Language Standard Working Group

 

AI and data science are frequently highlighted as game-changers for methane emissions reduction. You’ve likely seen headlines promising to “unlock the potential of your methane data”or blogs touting AI-driven insights. But there’s a fundamental issue often overlooked in these discussions: AI requires a robust data standard foundation. Without it, the promise of AI remains unfulfilled—or worse, its application can become misleading or even dangerous.

 

The Current State of Methane Data

Methane data today is disorganized and siloed, plagued by:

  • Inconsistent Definitions: Different organizations use varying terminologies and methodologies for data collection.

  • Data Silos: Valuable datasets are locked in proprietary systems, limiting access and integration.

  • Lack of Interoperability: Even when accessible, datasets often cannot be combined due to incompatible formats.

These challenges make advanced analysis—and, by extension, AI—difficult, if not impossible. Applying AI to such fragmented data leads to faulty outputs, wasted resources, and diminished trust in data-driven approaches. We cannot build a methane emissions foundation model without a standard methane data foundation.

 

A Lesson from the Internet of Water

Are there proven best practices from other fields that we can learn from? The Internet of Water offers valuable insights. This initiative improves the accessibility and usability ofwater data by adopting international sensor web standards. These standards enable seamless integration of data streams from diverse sources, such as the US Geological Survey, France’s BRGM, and the UK’s British Geological Survey.

Let me show you a demo and explain why. I opened my trusted ChatGPT and entered the following prompt.

By entering a simple prompt into ChatGPT, I received a fully functional code snippet. I copied itinto CodePen.io, and voilà! Without writing a single line of code, I created a visualization displaying water observations from multiple sources in one unified view.

Here’s what the demo reveals about the intersection of standards and AI: 

  1. LLMs Understand the Underlying Ontologies: ChatGPT inherently understands the relationships within the data model (i.e., ISO/OGC:19156 O&M or OMS) — Data Streams, Locations, Observations, etc. — without prior explanation.
  2. LLMs Understand the Underlying API: Without referring to API documentation or using retrieval-augmented generation (RAG), the LLM generates advanced API calls based on the ITU-T/OGCSensorThings API standard, including features like $expand for complex queries, demonstrating its ability to perform zero-shot learning.
  3. Metadata Integration: LLMs retrieve meta data, such as observed properties, locations, and sensor names, and present them through intuitive maps and charts.
  4. Embedded Standards: The foundational standards (ISO, OGC, ITU-T) used by the Internet of Water are embedded within the LLM’s training, empowering AI to access and utilize the data effectively.

The Internet of Water’s success demonstrates that adopting international standards lays a robust foundation for AI to access, interpret, and act on data. This alignment accelerates data usability and drives innovation.

However, even the Internet of Water community acknowledges they are still working to make their data fully FAIR: Findable, Accessible, Interoperable, and Reusable. By comparison, methane emissions data is in its infancy. For methane data to reach its potential, it must follow a similar path, starting with the adoption of foundational sensor web data standards.

 

The Methane Data Problem

Methane emissions reduction is fundamentally a data management challenge, as it relies on integrating a wide range of sensor data to enable early detection, prompt repairs, and efficient containment of the gas within the pipeline.

Before we can train sophisticated models or leverage advanced tools like LLMs, we need to address:

  1. Common Understanding: Define key concepts such as emissions, sources, measurements, and uncertainties.
  2. Standardized Formats: Create machine-readable formats for consistent encoding and exchange of data.
  3. Accessible Tools: Build libraries and platforms that enable easy access and analysis of standardized data.

What Is Needed?

  1. Ontology Development: A common ontology is essential for defining methane-specific concepts and relationships. This includes but is not limited to:
    • Emission events, sources, causes, and features of interest (e.g., facility, site, equipment, component)
    • Methane-related observations and observed properties.
    • Units of measurement and associated uncertainties.
    • Methane measurement procedures (including sensing hardware, algorithms, methodologies, and deployment protocols)
  2. Data Models and Encoding Standards: Once key concepts are defined, they must be translated into machine-readable formats, such as JSON, Protocol Buffers, or Parquet, which can be validated for consistency and accuracy. For example, JSON schemas can be developed to enable seamless data integration across platforms and ensure interoperability.
  3. Accessible Tools: Tools for data cleaning, visualization, and analysis must be developed to ensure methane data is interoperable and compatible with the diverse tools and platforms utilized by users.

Risks of Not Building a Strong Data Foundation for Methane

AI thrives on high-quality, consistent, and interoperable data. Applying AI to methane emissions without a solid data foundation will not yield meaningful insights. Instead, it risks:

  • Generating inaccurate predictions due to poor dataquality.
  • Fragmenting efforts by reinforcing incompatible datasilos.
  • Undermining trust in data-driven approaches.

The success of AI in methane emissions depends on establishing and adhering to FAIR principles. Interoperability and standardization are prerequisites for any data-driven innovation, not optional enhancements.

 

Call to Action – Let’s Work Together

The methane data ecosystem needs to prioritize:

  • Developing a common ontology.
  • Establishing open data standards.
  • Creating accessible software libraries.

AI has tremendous potential, but only when built on a foundation of standardized, high-quality data. By focusing on these fundamentals, we can unlock the true power of methane data — and make meaningful progress toward emissions reduction.

Standardizing methane data is essential for unlocking AI’s full potential — there’s no need to reinvent the wheel by developing proprietary formats and making the problem even worse. So, what’s being done to address these challenges?

The Open Geospatial Consortium (OGC) is forming the Emission Event Modeling Language Working Group (EmissionML) to bridge the interoperability gap between emissions data, sensor observations, and their geospatial sources. Expected to launch in March 2025, this initiative will define standards addressing methane data management challenges, ensuring seamless data exchange across platforms while making emissions data more accessible for AI and LLMs.

If you’re interested in contributing, please contact Dr. Steve Liang, the EmissionML working group convener, to get involved. Let’s build this foundation together — because solving methane emissions requires open collaboration and robust data standards.