Quick Answer
80% of earth observation work is pipeline plumbing, not science. Every team writes the same STAC queries, cloud masking, band arithmetic, and deployment scripts. No existing tool handles the full cycle: natural language input to deployed pipeline. Axis Spatial is building exactly that - describe your analysis, get a deployed pipeline you own.
Ask any EO engineer what they spent last month on and the answer is usually the same: writing boilerplate. The analysis itself took two days. Getting data, masking clouds, reprojecting, scheduling the job, deploying it somewhere it would actually run - that took three weeks.
This is not a story about a bad engineer. It is a story about a fragmented toolchain with no connecting layer. And it is happening at every organisation that touches satellite data, independently, simultaneously.
The EO Last-Mile Problem
The core tension
Copernicus provides the data for free. The bottleneck is the 30-50 lines of Python required to access each layer of it - and the fact that those lines need to be assembled, tested, deployed, and maintained by someone with a very specific skill set.
Europe invested EUR 8.2 billion in Copernicus. The programme generates petabytes of freely available Sentinel-2 imagery, land cover classifications, atmospheric data, and emergency mapping products. By any measure, the data access problem is solved.
The problem that remains is the translation layer. Turning satellite imagery into answers - an NDVI time series for a farming cooperative, a flood extent map for a municipal authority, a deforestation alert for a conservation NGO - requires expertise in Python, STAC API clients, rasterio, cloud deployment, and geospatial data standards. That expertise is scarce and expensive.
THE SCALE OF THE PROBLEM
The 450,000-organisation gap between "needs EO" and "can build EO" is not closing fast. Training programmes produce analysts slowly. Consultancies charge EUR 500,000 to EUR 800,000 for projects with high failure rates. The market has accepted this as a structural cost of EO adoption rather than as a product opportunity.
Anatomy of an EO Pipeline
Every EO analytical pipeline shares the same seven-stage structure. The domain varies - agriculture, forestry, urban planning, insurance - but the engineering components are identical. Here is what every team builds independently.
Data Discovery via STAC API
Query the Copernicus Data Space Ecosystem STAC catalogue for imagery matching a bounding box, date range, and cloud cover threshold. Every team writes a pystac_client search call with the same parameters.
Download or Stream from CDSE
Authenticate, handle rate limits, decide between full download and COG streaming, manage partial reads for large tiles. S3-compatible API access, but with Copernicus-specific authentication.
Preprocess
Apply cloud mask using the Scene Classification Layer (SCL band, values 4, 5, 6 for clear pixels). Reproject to the target CRS. Mosaic multiple tiles. Apply scale factors. Every team writes this.
Analysis
The actual science - NDVI calculation, land cover classification, change detection between dates, zonal statistics over administrative boundaries. This is where the domain expertise lives.
Output to Cloud-Native Formats
Write results as Cloud-Optimised GeoTIFF or GeoParquet. Add STAC item metadata. Register in the output catalogue. Compress. Configure overviews.
Deploy as Scheduled Job
Package as a Docker container or Databricks notebook. Configure cron schedule. Handle dependency management. Set up alerting for failures. Wire to the organisation's cloud account.
Monitoring
Log run durations, output file sizes, coverage gaps. Alert on cloud cover above threshold. Track data freshness. This step is usually skipped on the first version and regretted.
Steps 01, 02, 03, 05, 06, and 07 are identical across nearly every EO use case. Step 04 - the analysis - is where the actual domain differentiation lives. Yet teams spend 80% of their time on the identical steps and 20% on the step that actually differentiates their output.
The Available Tools - and What Is Missing
Several tools address parts of this problem. None address all of it. Here is an honest assessment, including the downsides.
| TOOL | STRENGTH | LIMITATION |
|---|---|---|
| Google Earth Engine | Petabyte-scale compute, massive data archive, well-documented | Code-only input, locked to GEE platform, cannot deploy pipelines to your own infrastructure, commercial pricing opaque |
| openEO / CDSE | Open standard, interoperable backends, strong Copernicus integration | Still requires code (process graphs), locked to CDSE-compliant backends, limited deployment flexibility |
| Sentinel Hub | Fast API, good cloud masking, reliable uptime | Per-request pricing at scale, locked to Sentinel Hub API, no deployment to your own infrastructure |
| UP42 | Marketplace model, diverse data providers, easy access | Select providers manually, locked to UP42 platform, not designed for scheduled operational pipelines |
| FME | Visual workflow builder, wide format support, no-code for simple cases | Not EO-native, no STAC support, expensive licences, cloud deployment requires FME Server |
The Gap
No tool takes natural language input, builds the pipeline logic, and deploys it to the user's infrastructure. Every tool in this table requires either code, platform lock-in, or both. The missing layer is the translation between "I want monthly NDVI for this region, cloud-masked, from Sentinel-2" and a running, scheduled job that the user owns and can modify.
What "Pipeline as a Service" Looks Like
The product concept is straightforward to describe and non-trivial to build. A user describes their analysis in plain language. The system translates that description into a working pipeline, deploys it to the user's infrastructure, and hands over the code.
PIPELINE GENERATION FLOW
"Calculate NDVI for my farm from Sentinel-2, monthly, mask clouds"
Extract: spatial extent (farm boundary), data source (Sentinel-2 L2A), temporal cadence (monthly), preprocessing (SCL cloud mask, values 4/5/6), analysis (NDVI = (B8-B4)/(B8+B4)), output format (COG)
Generate STAC query, SCL masking logic, band arithmetic, COG output writer, monthly cron schedule, Docker packaging, Databricks/AWS/GCP deployment configuration
Push to user's existing cloud infrastructure - Databricks workspace, AWS Glue, GCP Dataproc, or local Docker. No new accounts or infrastructure required.
User receives the generated Python code, Terraform configuration, CI/CD pipeline, and runbook. They own it. They can modify it. It runs without the vendor.
The critical design decision is the ownership model. Users should own the output code, not be dependent on a vendor API to run it. A pipeline that stops working when the vendor's service is down is not an operational pipeline - it is a liability. The product generates code that runs independently.
This is also the answer to the "why not just use GEE?" question. GEE is where you explore data. It is not where you run operational pipelines that need to integrate with your existing data warehouse, or that need to run in a regulated environment with data sovereignty requirements.
The Expertise Gap Is the Market
The 450,000 organisations that cannot build EO pipelines are not going to solve this by hiring. The Python engineers who can write production STAC clients, cloud masking routines, and COG output pipelines are in short supply and expensive. They are also not primarily interested in writing the same data ingestion boilerplate for the hundredth time.
The real bottleneck
The domain experts at these 450,000 organisations - the agronomist who knows exactly which NDVI threshold indicates crop stress, the urban planner who understands which land cover classes to monitor, the flood risk analyst who knows the inundation model - have the knowledge. They lack the pipeline engineering skills. The product should bridge that gap, not require them to acquire those skills first.
This is not primarily a technology problem. Rasterio, xarray, pystac_client, and GDAL already do everything needed. The problem is that assembling these tools into a working, deployed pipeline requires skills that most domain experts do not have and most organisations cannot afford to hire for every use case.
A municipal authority wants to monitor vegetation health in public parks. They have a GIS officer with QGIS experience. They do not have a Python developer with STAC experience. The gap between what they want and what they can build is not one skill - it is five: STAC API access, rioxarray processing, cloud deployment, scheduling, and monitoring. Each is learnable, but the combination takes months to acquire and years to use with confidence.
Municipal authorities
Monthly greenspace and impervious surface monitoring for urban heat island analysis
Farming cooperatives
Weekly NDVI and soil moisture indices across member parcels for irrigation scheduling
Conservation NGOs
Quarterly deforestation alerts and land cover change detection in protected areas
Insurance companies
Post-event flood and fire extent mapping within 48 hours for claims triage
Infrastructure operators
Subsidence monitoring over pipeline corridors using Sentinel-1 InSAR time series
Research institutions
Annual land cover classification updates without renegotiating cloud compute contracts
When This Does Not Work
Not every EO workflow is a candidate for automated pipeline generation. Being specific about the limits is more useful than claiming otherwise.
Novel ML architectures
If the analysis requires a custom deep learning model - a bespoke crop type classifier trained on local ground truth, a custom building damage assessment model - automated pipeline generation cannot help with the core analytical step. The pipeline plumbing still applies, but the analysis itself requires research engineering.
Real-time SAR streaming
High-cadence Sentinel-1 SAR processing for near-real-time applications - ship detection, flood monitoring at hourly resolution - has latency and throughput requirements beyond scheduled batch processing. The architecture is fundamentally different.
Global-scale analyses
If you are computing something across the entire Sentinel-2 archive at global scale, compute cost dominates development cost. The engineering effort is a rounding error compared to the cloud bill. Tools like GEE or Pangeo are better fits here.
Standard EO workflows
NDVI time series, land cover classification, change detection, zonal statistics, cloud masking, mosaicking, format conversion. These are solved problems with known implementations. The only question is how fast you can get to a deployed, running version.
The standard workflows in the last category represent the majority of operational EO use cases. Research teams work on the first three categories; the remaining 450,000 organisations primarily need the fourth.
What We Are Building
Axis Spatial is building the missing translation layer between domain knowledge and deployed EO pipelines.
Our current product - Axis Agents - handles the GIS migration side: legacy ArcPy scripts, FME workbenches, and QGIS projects migrated to cloud-native pipelines in one workflow. It is live at axisspatial.com/agents and in active use.
The EO pipeline builder is the next layer: describe a new analysis from scratch, without a legacy workflow to migrate. Natural language in, deployed pipeline out. The user owns the generated code and can run it on their Databricks workspace, AWS account, or local infrastructure without dependency on our platform.
CURRENT STATUS
Axis Agents (GIS migration) - live at app.axisspatial.com
Sentinel-2 migration patterns built into the backend (STAC, SCL masking, band arithmetic)
Databricks Technology Partner onboarding in progress
EO pipeline builder - in development. Natural language to deployed Sentinel-2 pipeline.
If you are building EO workflows and spending most of your time on the plumbing rather than the analysis, we would like to hear from you. We are working with a small number of organisations as design partners during development.
Monthly insights on GIS workflow automation.