- Lambda cold start with geospatial libs (GDAL, rasterio): 8-12 seconds. Warm invocations: 200-400ms
- S3 + COG: read any region of a 50GB raster in 50-200ms via HTTP range requests. Storage cost: $0.023/GB/month
- Athena spatial SQL: query GeoParquet on S3 without a database. $5 per TB scanned, zero infrastructure
- Step Functions + Lambda: orchestrate multi-step geospatial pipelines for $0.025 per 1,000 state transitions
AWS has 200+ services but no “AWS for Geospatial” product. That is both the challenge and the opportunity. You assemble your own stack from Lambda, S3, Athena, and Step Functions.
Done right, it is cheaper and more flexible than any packaged solution. Done wrong, it is a labyrinth of cold starts, timeout errors, and unexpected bills. This post covers the patterns that work in production and the pitfalls we have hit building geospatial pipelines on AWS for clients processing terabytes of raster and vector data.
Geospatial in Cloud Series
This is Part 2 of our Geospatial in Cloud series. Each post is self-contained. Part 1 covers Databricks. Part 3 covers GCP. Read the one that matches your stack.
AWS Geospatial Landscape
Eight AWS services matter for geospatial. Most production workloads use only four. Here is the full map so you know what exists - and what to ignore.
| SERVICE | GEOSPATIAL USE | WHEN TO USE |
|---|---|---|
| Lambda | Serverless processing (raster tiles, vector transforms) | Event-driven, small-to-medium payloads |
| S3 | Storage (COG, GeoParquet, STAC catalogues) | Always - primary storage layer |
| Athena | Spatial SQL on S3 data (GeoParquet) | Ad-hoc queries, no infrastructure needed |
| Step Functions | Pipeline orchestration | Multi-step workflows |
| ECS/Fargate | Container-based processing | Heavy processing, long-running jobs |
| SageMaker | ML on geospatial data | Satellite imagery classification |
| Location Service | Maps, geocoding, routing | Consumer-facing apps |
| EMR (Spark) | Large-scale processing | 100M+ record datasets |
KEY INSIGHT: KEEP IT SIMPLE
Most geospatial workloads on AWS use only 4 services: S3 + Lambda + Athena + Step Functions. The bottom four in this table are specialist tools - reach for them only when the core four genuinely cannot handle your workload. Starting with ECS or SageMaker before trying Lambda is over-engineering.
Lambda for Geospatial
The cold start problem is the number one issue nobody talks about honestly. GDAL + rasterio + numpy together form a ~150MB Lambda layer. The first invocation after a period of inactivity takes 8-12 seconds to initialise. That is unacceptable for real-time APIs but perfectly fine for batch processing and event-driven pipelines.
LAMBDA COLD START - HONEST NUMBERS
Lambda with 1769MB memory, us-east-1
Warm invocations run in 200-400ms for typical raster tile generation. The trick is deciding whether you need provisioned concurrency ($0.015/GB-hour) or can accept cold starts for batch and event-driven processing.
Here is a production Lambda function that extracts a tile from a Cloud Optimised GeoTIFF stored on S3:
# Lambda: extract tile from COG on S3
import rasterio
from rasterio.windows import Window
def handler(event, context):
s3_path = f"s3://{event['bucket']}/{event['key']}"
bbox = event['bbox'] # [minx, miny, maxx, maxy]
with rasterio.open(s3_path) as src:
window = src.window(*bbox)
data = src.read(window=window)
return {
"statusCode": 200,
"shape": data.shape,
"dtype": str(data.dtype),
}DEPLOYMENT TIP: USE DOCKER, NOT ZIP
The 250MB unzipped size limit kills you with GDAL. Docker-based Lambda images support up to 10GB. Use a multi-stage Docker build with ghcr.io/lambgeo/lambda-gdal as your base image. It includes GDAL, rasterio, and numpy pre-compiled for Lambda.
S3 + COG Storage
Why COG matters on S3: HTTP range requests mean Lambda reads only the pixels it needs. A 50GB satellite image can serve a 1km x 1km tile in 50-200ms without downloading the entire file. This is the pattern that makes serverless geospatial economically viable.
Storage cost: $0.023/GB/month on S3 Standard. A 1TB raster catalogue costs $23/month. For archival data accessed less than once a month, S3 Glacier Instant Retrieval drops this to $0.004/GB/month - an 83% reduction.
# Reading COG from S3 with rasterio (HTTP range request)
import rasterio
# Direct S3 access via GDAL virtual filesystem
with rasterio.open("s3://my-bucket/imagery/scene.tif") as src:
# Read only a specific window (HTTP range request)
window = rasterio.windows.from_bounds(
left=13.3, bottom=52.5, right=13.4, top=52.6,
transform=src.transform
)
data = src.read(window=window)
print(f"Read {data.shape} pixels from {src.width}x{src.height} raster")The STAC catalogue pattern ties this together: index your COGs on S3 with a STAC API. Query by date, bounding box, cloud cover - then access individual COGs directly via range requests. No database, no file server, just S3 + metadata.
For a deep dive on COG, GeoParquet, and STAC formats, see our cloud-native geospatial formats guide.
Athena Spatial SQL
Athena now supports spatial functions on GeoParquet data stored in S3. No database to manage, no infrastructure to provision. Point Athena at an S3 bucket of GeoParquet files and query.
-- Spatial query: parcels within a bounding box
SELECT
p.parcel_id,
p.area_sqm,
ST_Distance(p.geometry, ST_Point(13.4, 52.5)) as distance_m
FROM "s3://my-bucket/parcels/" p
WHERE ST_Contains(
ST_GeomFromText('POLYGON((13.3 52.5, 13.4 52.5, 13.4 52.6, 13.3 52.6, 13.3 52.5))'),
p.geometry
)
ORDER BY distance_m
LIMIT 100Cost: $5 per TB of data scanned. With GeoParquet's columnar format, you typically scan 10-20% of the total data (only the columns you reference), so effective cost is $0.50-$1.00 per TB of actual data stored. For 100 queries a month on a 1TB dataset, that is roughly $5-10/month total.
PARTITION YOUR DATA
Athena charges per TB scanned. Partitioning your GeoParquet files by region or date means Athena skips irrelevant files entirely. A query on Berlin parcels should not scan data for Munich. Use Hive-style partitioning (s3://bucket/parcels/country=DE/state=BE/) and your costs drop by 80-90%.
The limitation: Athena is not a real-time database. Query execution takes 3-15 seconds depending on data volume and complexity. For sub-second spatial queries serving a web application, use PostGIS on RDS. Athena is for ad-hoc analysis and batch reporting.
Step Functions Pipelines
Geospatial workflows are rarely a single operation. Satellite imagery arrives, needs validation, conversion to COG, quality checks, and catalogue updates. Step Functions orchestrates this without you managing any servers.
A real production pipeline - satellite imagery ingestion:
S3 Event: New Image Uploaded
S3 triggers the pipeline when a new GeoTIFF lands in the raw bucket. No polling, no cron jobs. Event-driven from the start.
Lambda 1: Validate and Extract Metadata
Check CRS, resolution, band count, and spatial extent. Reject malformed files before spending compute on processing. Extract metadata for the STAC catalogue.
Lambda 2: Generate COG Tiles
Convert raw GeoTIFF to Cloud Optimised GeoTIFF with internal tiling and overviews. This enables the range request pattern that makes serving tiles from S3 fast.
Lambda 3: Quality Checks
Verify the COG output: correct tile sizes, overview levels, spatial extent matches input. Catch processing errors before they propagate downstream.
Lambda 4: Update STAC Catalogue
Register the processed image in the STAC catalogue with metadata, thumbnail, and access links. The image is now discoverable and queryable.
PIPELINE COST PER IMAGE
Total: ~$0.03 per image (5 Lambda invocations + S3 reads/writes + Step Functions state transitions). At 1,000 images per day, that is $30/month for a fully automated ingestion pipeline with validation, conversion, quality assurance, and cataloguing.
Real Benchmarks
These numbers come from production workloads running in us-east-1. Lambda configured with 1769MB memory (1 full vCPU). S3 Standard storage class. We include the cold start numbers because pretending they do not exist helps nobody.
AWS GEOSPATIAL - PRODUCTION BENCHMARKS

Cost Analysis
The cost advantage of AWS for geospatial is real - but only if you have the engineering capability to build and maintain the stack. Here is the side-by-side comparison for a mid-sized team processing 10K tiles per day with 1TB of raster storage.
ESRI STACK
MONTHLY TOTAL
~$1,000/mo
+ Enterprise licence fee
AWS STACK
MONTHLY TOTAL
~$83/mo
No licence fees
HONEST CAVEAT: ENGINEERING COST IS REAL
These numbers assume you have the engineering capability to build and maintain the AWS stack. ESRI's cost includes the convenience of a managed platform with documentation, support, and a GUI. If your team does not have AWS experience, the engineering cost of building and maintaining this stack may exceed the savings for 1-2 years. The 12x cost difference only materialises when your team can operate the stack independently.
When NOT to Use AWS for Geospatial
We build geospatial pipelines on AWS for clients. We still tell some of them not to. These are the scenarios where AWS is the wrong choice:
1. Your team does not have AWS experience
The learning curve is steep. Lambda layers, IAM roles, VPC networking, S3 event triggers - each has gotchas that take weeks to understand. Budget 2-3 months ramp-up for a team new to AWS before you can build anything production-grade.
2. You need a managed geospatial platform
AWS does not have one. You assemble from primitives. If you want click-to-deploy geospatial with a GUI, vendor support, and documentation, consider ESRI on AWS or Databricks. The DIY approach requires genuine engineering investment.
3. Real-time spatial queries (sub-50ms)
Athena is not a real-time database. Minimum query time is 3 seconds regardless of data volume. For sub-50ms spatial queries serving a live web application, use PostGIS on RDS or Aurora with a spatial index. Lambda + Athena is for batch, not interactive.
4. Large-scale raster analysis
Lambda has a 15-minute execution limit and 10GB memory ceiling. For heavy raster processing - mosaicking large areas, time-series analysis on years of satellite data, multi-band classification - use ECS/Fargate or SageMaker Processing. Or consider Google Earth Engine which was built precisely for this workload.
5. You are already on Azure or GCP
Multi-cloud adds complexity with minimal benefit. If your organisation's primary cloud is Azure or GCP, use their native geospatial services. The architectural patterns in this post translate to any cloud - the specific services just have different names. Do not split your infrastructure for geospatial alone.
Reference Architecture
This is the production architecture we deploy for clients processing 1TB+ of geospatial data. Four layers, four AWS services per layer, no over-engineering.
1. DATA INGESTION
2. ANALYSIS
3. PIPELINE ORCHESTRATION
4. SERVING
The STAC catalogue sits alongside this: a Lambda-backed API with DynamoDB storage that indexes all processed data. Users query the catalogue by bounding box, date range, or sensor type, then access individual COGs directly from S3 via range requests. Total infrastructure cost for the catalogue: under $10/month.
Frequently Asked Questions
Can AWS handle geospatial workloads?
Yes, but there is no single “AWS Geospatial” service. You combine S3 (storage), Lambda (processing), Athena (spatial SQL), and Step Functions (orchestration). This gives maximum flexibility but requires engineering effort to assemble. Most production geospatial workloads on AWS use only these four services.
What is the cold start time for Lambda with GDAL?
Cold start with a full GDAL/rasterio Lambda layer is 8-12 seconds. Warm invocations run in 200-400ms. For latency-sensitive workloads, use provisioned concurrency ($0.015/GB-hour) to keep functions warm, reducing cold start to under 0.5 seconds.
How much does geospatial processing cost on AWS?
Storage: $0.023/GB/month on S3. Processing: approximately $0.0001 per Lambda invocation for typical tile generation. Spatial queries: $5 per TB scanned via Athena. A typical monthly workload (1TB storage, 10K daily tile requests, weekly ad-hoc queries) costs approximately $50-80/month.
AWS gives you maximum control at minimum cost - if you have the engineering capability to assemble and maintain the stack.
$0.03 per image through a full pipeline. $23/month for a terabyte of rasters. Spatial SQL at $5 per TB scanned. The economics are compelling for teams that can operate the infrastructure. For everyone else, a managed platform saves more than it costs.
The pattern is the same regardless of cloud: store data in cloud-native formats, process with serverless compute, query with spatial SQL, orchestrate with managed workflows. AWS just happens to give you the most granular control over each piece.
Get Workflow Automation Insights
Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.
