Skip to main content
Earth Observation

Why Every Earth Observation Team Builds the Same Pipeline From Scratch

STAC queries, cloud masking, band arithmetic, deployment scripts. The same 200 lines, written by thousands of teams, independently, every year.

PUBLISHEDMAR 2026
CATEGORYEO PIPELINES
READ TIME12 MIN
AUTHORAXIS SPATIAL
12 min read
  • EUR 8.2 billion invested in Copernicus. Petabytes of free satellite data. Fewer than 50,000 of the 500,000+ organisations that need EO insights can build the pipelines to access them.
  • 80% of EO engineering time goes to pipeline plumbing - STAC queries, cloud masking, band arithmetic, scheduling, deployment - not to the science itself. Every team writes these components independently.
  • The existing tool market has real gaps: GEE requires code and locks you to their platform; openEO requires code and locks you to CDSE backends; Sentinel Hub is per-request and locked to their API. No tool handles natural language input to deployed pipeline.
  • The 450,000 organisations that cannot build EO pipelines are not going to hire Python developers. They need a tool that converts domain knowledge into running, scheduled code they own.
  • Standard EO workflows - NDVI, land cover, change detection, zonal statistics - are technically solved problems. The gap is productisation, not research.

Quick Answer

80% of earth observation work is pipeline plumbing, not science. Every team writes the same STAC queries, cloud masking, band arithmetic, and deployment scripts. No existing tool handles the full cycle: natural language input to deployed pipeline. Axis Spatial is building exactly that - describe your analysis, get a deployed pipeline you own.

Ask any EO engineer what they spent last month on and the answer is usually the same: writing boilerplate. The analysis itself took two days. Getting data, masking clouds, reprojecting, scheduling the job, deploying it somewhere it would actually run - that took three weeks.

This is not a story about a bad engineer. It is a story about a fragmented toolchain with no connecting layer. And it is happening at every organisation that touches satellite data, independently, simultaneously.

The EO Last-Mile Problem

The core tension

Copernicus provides the data for free. The bottleneck is the 30-50 lines of Python required to access each layer of it - and the fact that those lines need to be assembled, tested, deployed, and maintained by someone with a very specific skill set.

Europe invested EUR 8.2 billion in Copernicus. The programme generates petabytes of freely available Sentinel-2 imagery, land cover classifications, atmospheric data, and emergency mapping products. By any measure, the data access problem is solved.

The problem that remains is the translation layer. Turning satellite imagery into answers - an NDVI time series for a farming cooperative, a flood extent map for a municipal authority, a deforestation alert for a conservation NGO - requires expertise in Python, STAC API clients, rasterio, cloud deployment, and geospatial data standards. That expertise is scarce and expensive.

THE SCALE OF THE PROBLEM

500,000+
organisations that need EO insights
<50,000
that can build EO pipelines today
80%
of EO engineering time spent on plumbing, not science

The 450,000-organisation gap between "needs EO" and "can build EO" is not closing fast. Training programmes produce analysts slowly. Consultancies charge EUR 500,000 to EUR 800,000 for projects with high failure rates. The market has accepted this as a structural cost of EO adoption rather than as a product opportunity.

Anatomy of an EO Pipeline

Every EO analytical pipeline shares the same seven-stage structure. The domain varies - agriculture, forestry, urban planning, insurance - but the engineering components are identical. Here is what every team builds independently.

01

Data Discovery via STAC API

Query the Copernicus Data Space Ecosystem STAC catalogue for imagery matching a bounding box, date range, and cloud cover threshold. Every team writes a pystac_client search call with the same parameters.

02

Download or Stream from CDSE

Authenticate, handle rate limits, decide between full download and COG streaming, manage partial reads for large tiles. S3-compatible API access, but with Copernicus-specific authentication.

03

Preprocess

Apply cloud mask using the Scene Classification Layer (SCL band, values 4, 5, 6 for clear pixels). Reproject to the target CRS. Mosaic multiple tiles. Apply scale factors. Every team writes this.

04

Analysis

The actual science - NDVI calculation, land cover classification, change detection between dates, zonal statistics over administrative boundaries. This is where the domain expertise lives.

05

Output to Cloud-Native Formats

Write results as Cloud-Optimised GeoTIFF or GeoParquet. Add STAC item metadata. Register in the output catalogue. Compress. Configure overviews.

06

Deploy as Scheduled Job

Package as a Docker container or Databricks notebook. Configure cron schedule. Handle dependency management. Set up alerting for failures. Wire to the organisation's cloud account.

07

Monitoring

Log run durations, output file sizes, coverage gaps. Alert on cloud cover above threshold. Track data freshness. This step is usually skipped on the first version and regretted.

Steps 01, 02, 03, 05, 06, and 07 are identical across nearly every EO use case. Step 04 - the analysis - is where the actual domain differentiation lives. Yet teams spend 80% of their time on the identical steps and 20% on the step that actually differentiates their output.

The Available Tools - and What Is Missing

Several tools address parts of this problem. None address all of it. Here is an honest assessment, including the downsides.

TOOLSTRENGTHLIMITATION
Google Earth EnginePetabyte-scale compute, massive data archive, well-documentedCode-only input, locked to GEE platform, cannot deploy pipelines to your own infrastructure, commercial pricing opaque
openEO / CDSEOpen standard, interoperable backends, strong Copernicus integrationStill requires code (process graphs), locked to CDSE-compliant backends, limited deployment flexibility
Sentinel HubFast API, good cloud masking, reliable uptimePer-request pricing at scale, locked to Sentinel Hub API, no deployment to your own infrastructure
UP42Marketplace model, diverse data providers, easy accessSelect providers manually, locked to UP42 platform, not designed for scheduled operational pipelines
FMEVisual workflow builder, wide format support, no-code for simple casesNot EO-native, no STAC support, expensive licences, cloud deployment requires FME Server

The Gap

No tool takes natural language input, builds the pipeline logic, and deploys it to the user's infrastructure. Every tool in this table requires either code, platform lock-in, or both. The missing layer is the translation between "I want monthly NDVI for this region, cloud-masked, from Sentinel-2" and a running, scheduled job that the user owns and can modify.

What "Pipeline as a Service" Looks Like

The product concept is straightforward to describe and non-trivial to build. A user describes their analysis in plain language. The system translates that description into a working pipeline, deploys it to the user's infrastructure, and hands over the code.

PIPELINE GENERATION FLOW

INPUT

"Calculate NDVI for my farm from Sentinel-2, monthly, mask clouds"

PARSE

Extract: spatial extent (farm boundary), data source (Sentinel-2 L2A), temporal cadence (monthly), preprocessing (SCL cloud mask, values 4/5/6), analysis (NDVI = (B8-B4)/(B8+B4)), output format (COG)

BUILD

Generate STAC query, SCL masking logic, band arithmetic, COG output writer, monthly cron schedule, Docker packaging, Databricks/AWS/GCP deployment configuration

DEPLOY

Push to user's existing cloud infrastructure - Databricks workspace, AWS Glue, GCP Dataproc, or local Docker. No new accounts or infrastructure required.

HAND OFF

User receives the generated Python code, Terraform configuration, CI/CD pipeline, and runbook. They own it. They can modify it. It runs without the vendor.

The critical design decision is the ownership model. Users should own the output code, not be dependent on a vendor API to run it. A pipeline that stops working when the vendor's service is down is not an operational pipeline - it is a liability. The product generates code that runs independently.

This is also the answer to the "why not just use GEE?" question. GEE is where you explore data. It is not where you run operational pipelines that need to integrate with your existing data warehouse, or that need to run in a regulated environment with data sovereignty requirements.

The Expertise Gap Is the Market

The 450,000 organisations that cannot build EO pipelines are not going to solve this by hiring. The Python engineers who can write production STAC clients, cloud masking routines, and COG output pipelines are in short supply and expensive. They are also not primarily interested in writing the same data ingestion boilerplate for the hundredth time.

The real bottleneck

The domain experts at these 450,000 organisations - the agronomist who knows exactly which NDVI threshold indicates crop stress, the urban planner who understands which land cover classes to monitor, the flood risk analyst who knows the inundation model - have the knowledge. They lack the pipeline engineering skills. The product should bridge that gap, not require them to acquire those skills first.

This is not primarily a technology problem. Rasterio, xarray, pystac_client, and GDAL already do everything needed. The problem is that assembling these tools into a working, deployed pipeline requires skills that most domain experts do not have and most organisations cannot afford to hire for every use case.

A municipal authority wants to monitor vegetation health in public parks. They have a GIS officer with QGIS experience. They do not have a Python developer with STAC experience. The gap between what they want and what they can build is not one skill - it is five: STAC API access, rioxarray processing, cloud deployment, scheduling, and monitoring. Each is learnable, but the combination takes months to acquire and years to use with confidence.

Municipal authorities

Monthly greenspace and impervious surface monitoring for urban heat island analysis

Farming cooperatives

Weekly NDVI and soil moisture indices across member parcels for irrigation scheduling

Conservation NGOs

Quarterly deforestation alerts and land cover change detection in protected areas

Insurance companies

Post-event flood and fire extent mapping within 48 hours for claims triage

Infrastructure operators

Subsidence monitoring over pipeline corridors using Sentinel-1 InSAR time series

Research institutions

Annual land cover classification updates without renegotiating cloud compute contracts

When This Does Not Work

Not every EO workflow is a candidate for automated pipeline generation. Being specific about the limits is more useful than claiming otherwise.

Novel ML architectures

If the analysis requires a custom deep learning model - a bespoke crop type classifier trained on local ground truth, a custom building damage assessment model - automated pipeline generation cannot help with the core analytical step. The pipeline plumbing still applies, but the analysis itself requires research engineering.

Real-time SAR streaming

High-cadence Sentinel-1 SAR processing for near-real-time applications - ship detection, flood monitoring at hourly resolution - has latency and throughput requirements beyond scheduled batch processing. The architecture is fundamentally different.

Global-scale analyses

If you are computing something across the entire Sentinel-2 archive at global scale, compute cost dominates development cost. The engineering effort is a rounding error compared to the cloud bill. Tools like GEE or Pangeo are better fits here.

Standard EO workflows

NDVI time series, land cover classification, change detection, zonal statistics, cloud masking, mosaicking, format conversion. These are solved problems with known implementations. The only question is how fast you can get to a deployed, running version.

The standard workflows in the last category represent the majority of operational EO use cases. Research teams work on the first three categories; the remaining 450,000 organisations primarily need the fourth.

What We Are Building

Axis Spatial is building the missing translation layer between domain knowledge and deployed EO pipelines.

Our current product - Axis Agents - handles the GIS migration side: legacy ArcPy scripts, FME workbenches, and QGIS projects migrated to cloud-native pipelines in one workflow. It is live at axisspatial.com/agents and in active use.

The EO pipeline builder is the next layer: describe a new analysis from scratch, without a legacy workflow to migrate. Natural language in, deployed pipeline out. The user owns the generated code and can run it on their Databricks workspace, AWS account, or local infrastructure without dependency on our platform.

CURRENT STATUS

Axis Agents (GIS migration) - live at app.axisspatial.com

Sentinel-2 migration patterns built into the backend (STAC, SCL masking, band arithmetic)

Databricks Technology Partner onboarding in progress

EO pipeline builder - in development. Natural language to deployed Sentinel-2 pipeline.

If you are building EO workflows and spending most of your time on the plumbing rather than the analysis, we would like to hear from you. We are working with a small number of organisations as design partners during development.

Monthly insights on GIS workflow automation.