Open Data Cubetime seriesdata managementxarrayanalysis

Open Data Cube: Managing and Analyzing Satellite Time Series at Scale

Kazushi MotomuraNovember 11, 20255 min read
Open Data Cube: Managing and Analyzing Satellite Time Series at Scale

Quick Answer: The Open Data Cube (ODC) is an open-source framework for managing and analyzing gridded Earth observation data. It indexes satellite imagery (Landsat, Sentinel-2, etc.) into a spatiotemporal database, enabling efficient queries like 'give me all NDVI values for this polygon from 2015-2024.' The data remains as files (COGs on disk or cloud storage); ODC indexes metadata for fast lookup. Built on Python/xarray/PostgreSQL, ODC is deployed by several national agencies including Geoscience Australia (Digital Earth Australia), Swiss Data Cube, and Africa Regional Data Cube. Key advantage over GEE: you own the infrastructure and data, with full algorithmic flexibility. Key disadvantage: requires IT infrastructure setup and maintenance.

National governments need satellite monitoring systems they control — systems where the data, the algorithms, and the infrastructure are under their authority, not dependent on a foreign company's goodwill. This is the governance rationale behind the Open Data Cube.

While Google Earth Engine democratized planetary-scale analysis for researchers, it doesn't solve the operational needs of government agencies that require guaranteed availability, full data sovereignty, and the ability to customize every aspect of their processing pipeline. The Open Data Cube provides an open-source alternative that agencies can deploy on their own infrastructure.

What the Open Data Cube Does

The Problem It Solves

Satellite time series analysis requires answering questions like:

  • What was the NDVI at this location on every cloud-free date since 2015?
  • Show me all Sentinel-2 observations for this watershed in July 2024
  • Compute the median surface reflectance for this region for each quarter

Without a management framework, answering these questions requires manually searching file catalogs, handling different file naming conventions, managing projections and resolution mismatches, and writing custom data loading code.

ODC provides a structured solution: index the data once, query it efficiently forever.

Architecture

Data storage: Satellite imagery stored as files — typically Cloud Optimized GeoTIFFs (COGs) on disk, NFS, or cloud object storage. ODC doesn't move or copy the data; it indexes where data files are and what they contain.

Metadata database: PostgreSQL database containing:

  • Product definitions (what is Sentinel-2 Level-2A? Which bands? What resolution?)
  • Dataset records (this specific Sentinel-2 scene covers this spatial extent, at this time, with these file paths)
  • Spatial/temporal indexes for fast queries

Python API: The datacube Python library provides high-level functions:

dc = datacube.Datacube()
data = dc.load(
    product='sentinel2_l2a',
    x=(lon_min, lon_max),
    y=(lat_min, lat_max),
    time=('2020-01-01', '2024-12-31'),
    measurements=['red', 'green', 'blue', 'nir']
)

This returns an xarray Dataset — a labeled multi-dimensional array that integrates seamlessly with the Python scientific computing ecosystem (NumPy, pandas, scikit-learn, matplotlib).

What You Get

The dc.load() call handles:

  • Finding all datasets that intersect the spatial and temporal query
  • Reading only the required spatial extent from each file
  • Reprojecting to a common CRS if needed
  • Resampling to a common pixel grid
  • Stacking into a 4D array (time × band × y × x)

This data loading and harmonization is the tedious part of satellite time series analysis. ODC automates it.

Deployments

Digital Earth Australia (DEA)

The flagship ODC deployment:

  • Operated by Geoscience Australia
  • Indexes the complete Landsat and Sentinel-2 archive over Australia
  • Produces national-scale products: water observations, fractional cover, coastline monitoring
  • Publicly accessible via DEA Sandbox (JupyterHub) and Open Data platform

Digital Earth Africa

Continental-scale deployment for Africa:

  • Sentinel-1, Sentinel-2, Landsat data for all of Africa
  • Analysis-ready data updated regularly
  • Products: water extent, cropland mapping, land cover
  • Funded by international development organizations

Swiss Data Cube

National deployment for Switzerland:

  • Complete Landsat and Sentinel-2 archive over Switzerland
  • Used for environmental monitoring, glacier tracking, snow cover analysis

Other Deployments

Vietnam, Colombia, Mexico, Taiwan, and other countries have deployed or are deploying national-scale ODC instances for various monitoring applications.

ODC vs. GEE

AspectOpen Data CubeGoogle Earth Engine
Data ownershipYou control everythingGoogle hosts and controls
Algorithm flexibilityFull (any Python code)Constrained by GEE API
InfrastructureYou manage (or cloud)Google manages
CostInfrastructure costsFree (research) / Paid (commercial)
Setup effortSignificantMinimal
Global data catalogYou build itPre-built
ScalabilityDepends on your infraGoogle-scale
SustainabilitySelf-controlledDepends on Google

Key Advantages

Sovereignty: Government agencies control their own data and processing. No dependency on external providers.

Flexibility: Any Python library, any algorithm, any machine learning framework. No API restrictions.

Reproducibility: You control the data versions, the code, and the environment. Results are reproducible years later.

Integration: ODC outputs integrate with standard Python tools (xarray, pandas, scikit-learn) and GIS tools (QGIS, GDAL).

Limitations

Setup complexity: Installing and configuring ODC, populating the index, and managing the data ingestion pipeline requires significant technical effort.

Data management: You're responsible for acquiring, storing, and updating the satellite data. This is a substantial ongoing operational cost.

Scale limitations: Without cloud infrastructure, processing is limited by your hardware. Cloud deployment solves this but adds complexity and cost.

Community size: Smaller user community than GEE means fewer tutorials, examples, and Stack Overflow answers.

The Modern ODC Ecosystem

ODC has evolved significantly:

odc-stac: Direct integration with STAC catalogs — load data from any STAC-compliant archive without local indexing.

odc-geo: Geospatial utilities for working with ODC data.

Datacube Explorer: Web interface for browsing indexed datasets.

odc-stats: Framework for computing temporal statistics (medians, percentiles, geomedians) at continental scale.

The evolution toward STAC integration is particularly significant — it means ODC can work with data in any STAC catalog (Microsoft Planetary Computer, Element 84 Earth Search, AWS Open Data) without needing to download and locally index everything.

When to Choose ODC

Choose ODC when: You need data sovereignty, full algorithmic flexibility, operational system reliability, and are willing to invest in infrastructure.

Choose GEE when: You need rapid exploration, global-scale analysis, minimal infrastructure, and can accept the constraints of GEE's programming model and dependency.

Choose cloud-native (STAC + xarray + Dask) when: You want flexibility without the full ODC framework — direct access to cloud-hosted data with standard Python tools.

The Open Data Cube represents a principled approach to Earth observation data management: open-source, community-governed, and designed for the operational needs of government agencies that must maintain long-term, sovereign monitoring capabilities. It's not the easiest path — but for organizations that need independence and full control, it's the most sustainable one.

Kazushi Motomura

Kazushi Motomura

Remote sensing specialist with 10+ years in satellite data processing. Founder of Off-Nadir Lab. Master's in Satellite Oceanography (Kyushu University).