- BigQuery GIS: native GEOGRAPHY type with S2 cell indexing - spatial SQL on billion-row datasets in 5-15 seconds. Coordinate order is longitude first, latitude second (the number one mistake that trips every team)
- Dataproc with Sedona: distributed spatial processing on managed Spark. Serverless Dataproc for one-off jobs, persistent clusters for heavy raster and vector workloads. GCP's answer to AWS Glue Spark ETL
- Cloud Composer (managed Airflow): orchestrates multi-step GIS pipelines across BigQuery, Dataproc, and Cloud Run. Replaces cron-based scheduling and manual coordination
- The local fallback trap: the most common GCP migration failure is code that should run spatial operations in BigQuery SQL but silently falls back to local pandas/GeoPandas. The pipeline 'works' but is 100-1000x slower
Google has the most powerful geospatial computing platform on the planet - Earth Engine. It also has the largest cloud-native spatial SQL engine - BigQuery GIS. But choosing between them, combining them, or deciding you need neither is where most teams get stuck.
This post breaks down each GCP geospatial service with real query times, actual costs, and real assessments of where they fall short. No vendor fluff. No feature-list repackaging. Just what works in production.
Geospatial in Cloud Series
This is Part 3 of our Geospatial in Cloud series. Each post is self-contained. Part 1 covers Databricks. Part 2 covers AWS. Part 4 covers Snowflake. Read the one that matches your stack.
GCP Geospatial Services Map
GCP's geospatial capabilities are spread across six services. Understanding which one to use for what is the first decision you need to make.
| SERVICE | GEOSPATIAL USE | WHEN TO USE |
|---|---|---|
| BigQuery GIS | Spatial SQL on massive datasets | Billions of rows, ad-hoc spatial analysis |
| Dataproc | Distributed Spark with Sedona | Heavy spatial processing, large rasters, millions of features |
| Cloud Composer | Managed Apache Airflow | Multi-step pipeline orchestration across all GCP services |
| Earth Engine | Satellite imagery analysis | Multi-temporal raster analysis, change detection |
| Cloud Run | Containerised geospatial APIs | Processing services, tile servers, APIs |
| Cloud Storage (GCS) | COG/GeoParquet storage | Primary data lake (like S3) |
| Vertex AI | ML on geospatial data | Classification, prediction models |
| Dataflow (Beam) | Streaming geospatial processing | Real-time sensor data, IoT |
Most geospatial teams on GCP use four of these: BigQuery GIS for analysis, Dataproc for heavy spatial processing, GCS for storage, and Cloud Run for serving. Cloud Composer ties multi-step pipelines together. Earth Engine is the specialist tool you bring in when satellite imagery is involved.
BigQuery GIS in Detail
BigQuery GIS operates on a native GEOGRAPHY type. No extensions, no plugins, no PostGIS-style setup. Spatial functions work directly on petabyte-scale tables with the same SQL your analysts already write.
A spatial join on 1 billion points against 50K polygons takes 8-12 seconds. Try that on PostGIS. The query is standard SQL with spatial predicates - ST_CONTAINS for point-in-polygon, ST_DISTANCE for proximity, ST_INTERSECTION for overlap. No extensions to install, no index creation required. The syntax is identical to what your data team already writes.
COORDINATE ORDER: THE NUMBER ONE MISTAKE
BigQuery GIS uses longitude first, latitude second for ST_GEOGPOINT. This is the opposite of what most GIS practitioners expect. ST_GEOGPOINT(13.4, 52.5) means longitude 13.4, latitude 52.5 (Berlin). Get this wrong and your points land in the ocean. Every team hits this. Use SAFE.ST_GEOGPOINT() which returns NULL instead of an error for out-of-range coordinates - but be warned: SAFE only catches values outside valid ranges (e.g., longitude 200). A swapped pair like (52.5, 13.4) when you meant (13.4, 52.5) is still within valid ranges and SAFE will happily return the wrong point. Visual spot-checks on a map are the only reliable way to catch axis swaps.
GEOGRAPHY only supports WGS84 (EPSG:4326). If your source data uses a projected coordinate system (UTM, British National Grid, etc.), you must reproject before loading into BigQuery. This is a hard requirement - BigQuery will silently accept projected coordinates but your spatial operations will return wrong results. Buffer distances are in metres (not degrees), which is actually an advantage over PostGIS where you constantly wrestle with degree-to-metre conversions.
Distance-based queries against the full OpenStreetMap dataset (1.5TB, pre-loaded in BigQuery's public datasets) run in 2-3 seconds. You can find all hospitals within 10km of a point, sorted by distance, across the entire planet's OSM data. This is where BigQuery's scale has few direct equivalents - Athena and Redshift Spectrum can query at scale on AWS, but the zero-infrastructure spatial indexing is uniquely BigQuery.
THE LOCAL FALLBACK TRAP
The most common GCP migration failure is silent: code that should run spatial operations in BigQuery SQL falls back to local pandas/GeoPandas processing. The pipeline “works” but is 100-1000x slower. This happens when developers load BigQuery results into a DataFrame and then apply Shapely operations locally instead of keeping all spatial logic in SQL. Always verify that your spatial operations are executing on BigQuery (check for a job_id) - not being silently computed in Python on the client machine.
KEY INSIGHT: S2 vs H3 INDEXING
BigQuery GIS uses S2 cell indexing internally - spatial operations are not scan-based, they use a spatial index automatically. No EXPLAIN ANALYZE, no manual index creation, no tuning. This is the single biggest difference from PostGIS, where spatial index creation and maintenance is a constant overhead.
S2 is the internal query index you cannot control. For application-level hexagonal aggregations - heatmaps, density analysis, catchment areas - you compute H3 indices yourself via BigQuery UDFs. H3 gives you consistent hexagonal cells at defined resolutions (resolution 9 for urban, 7 for regional - same as Databricks Mosaic). S2 optimises your queries; H3 structures your analytical output.
NAMING TRAP: BIGQUERY vs GCS
BigQuery dataset names only allow underscores (no hyphens). GCS bucket names allow hyphens but not underscores. These are opposite rules. Teams that name everything consistently with hyphens discover this when BigQuery rejects their dataset name. Adopt a naming convention early: underscores for BigQuery resources, hyphens for GCS buckets.
For detail on the file formats that power this, see our guide on cloud-native geospatial formats (GeoParquet, COG, STAC). GeoParquet and COG work natively with BigQuery and GCS.
Dataproc for Heavy Processing
BigQuery handles SQL-based spatial analysis. Dataproc handles everything else. When you need Python GIS libraries (rasterio, GDAL, fiona) or distributed Spark processing with Apache Sedona, Dataproc is GCP's answer. It is the equivalent of AWS EMR or Glue Spark ETL - a managed Spark cluster you can provision with geospatial libraries pre-installed.
Two modes: Serverless Dataproc for one-off jobs (no cluster to manage, you submit a PySpark script and it runs), and persistent clusters for interactive exploration or recurring workloads. Serverless has cold start but eliminates cluster management entirely. For persistent clusters, initialisation actions install GIS libraries - geopandas, rasterio, fiona, shapely, apache-sedona - which adds 2-5 minutes to cluster startup.
| USE CASE | MACHINE TYPE | MEMORY | NOTES |
|---|---|---|---|
| Small vector | n2-standard-4 | 16 GB | Basic spatial joins, format conversion |
| Large vector | n2-highmem-8 | 64 GB | Millions of features, complex spatial joins |
| Raster | n2-highmem-16 | 128 GB | Windowed raster processing, large COGs |
| ML + spatial | a2-highgpu-1g | 85 GB + GPU | GPU-accelerated ML inference on spatial data |
Sedona integration: pass Sedona JAR coordinates via the --properties flag when submitting Dataproc jobs. Sedona gives you distributed spatial SQL (ST_Intersects, ST_Buffer, ST_Union_Aggr) across the Spark cluster. Read GeoParquet from GCS, run spatial operations, and write results back to GCS or directly load into BigQuery via the BigQuery connector. This is the pattern for heavy spatial ETL that BigQuery SQL alone cannot handle - dissolves, complex unions, raster zonal statistics.
The GCS write limitation is identical to S3 and Databricks Volumes: GCS is object storage with no filesystem semantics. GeoTIFF and GeoPackage writes that require seek operations fail silently. Write to /tmp on the Dataproc node first, then upload the finished file to GCS. This is the same two-stage write pattern that every cloud platform requires for GDAL-based formats.
A related trap: os.path.exists() and Path(...).exists() do not work on GCS paths. They silently return False even when the file exists. Use the google.cloud.storage client with blob.exists() instead. This catches teams who port local file-checking patterns to the cloud without adapting the existence checks.
Cloud Composer (managed Airflow) orchestrates multi-step pipelines that span BigQuery, Dataproc, and Cloud Run. A typical GIS migration pipeline: load raw data into BigQuery (BigQueryInsertJobOperator), run spatial analysis in SQL, then hand off heavy raster processing to Dataproc (DataprocSubmitPySparkJobOperator). Composer manages dependencies, retries, and scheduling - replacing the cron-based scripts and manual coordination that most GIS teams rely on today. Environment creation takes 15-25 minutes, but once running, it handles the orchestration that Step Functions does on AWS.
WHEN TO USE WHAT
BigQuery GIS for SQL-based spatial analysis (joins, distance queries, aggregations). Dataproc + Sedona for heavy spatial ETL that requires Python libraries or distributed processing beyond SQL (dissolves, complex geometry operations, raster processing). Cloud Composer to wire them together in automated pipelines. Do not try to do everything in BigQuery - complex geometry operations belong in Dataproc. Do not try to do everything in Dataproc - simple SQL analysis is 10x faster in BigQuery.
Earth Engine: What Works and What Does Not
Earth Engine is extraordinary for a specific set of problems. It is also wildly overused for problems it was never designed to solve. Here is where it genuinely has no competition, and where it creates more problems than it solves.
Where Earth Engine Is Unmatched
70+ petabytes of satellite imagery - free
Landsat, Sentinel, MODIS, VIIRS - all pre-processed and analysis-ready. Downloading this data yourself would take months and cost thousands in transfer fees alone.
Multi-temporal analysis in minutes
Computing an NDVI time series over 5 years for a 10km x 10km area takes under a minute. Locally, you'd spend days downloading imagery before you could even start processing.
Built-in change detection and classification
Deforestation tracking, urban expansion monitoring, crop classification - algorithms are built in with the data co-located. No data movement.
Where Earth Engine Is Overkill or Wrong
1. Vector analysis
Earth Engine is raster-first. Vector operations are slow and limited compared to BigQuery GIS or PostGIS. Spatial joins on vector datasets that take seconds in BigQuery take minutes in Earth Engine - and the API is far less intuitive for SQL-trained analysts.
2. Custom processing pipelines
Earth Engine's execution model is opaque. You cannot control parallelism, memory allocation, or execution order. For complex multi-step pipelines where you need predictable behaviour, Cloud Run with your own GDAL/rasterio container gives you full control.
3. Integration with non-Google tools
Getting data out of Earth Engine into your data warehouse, dashboard, or downstream pipeline is painful. Exports are asynchronous and can take hours. If your architecture involves Snowflake, Databricks, or any non-Google analytics platform, plan for significant integration overhead.
4. Reproducibility
Earth Engine code runs on Google's servers with undocumented infrastructure. Exact reproduction of results is not guaranteed across time. For regulated industries (insurance, finance) where audit trails matter, this is a real compliance concern.
5. Cost predictability
The free tier is generous for research. But commercial use pricing is opaque and usage-dependent. BigQuery's $5/TB scanned model is transparent and controllable. Earth Engine's commercial pricing makes budgeting difficult.
RECOMMENDATION
Use Earth Engine for what it is best at: satellite imagery analysis at planetary scale. Use BigQuery GIS + Cloud Run for everything else. Do not try to force Earth Engine into a general-purpose geospatial platform - you will spend more time fighting its limitations than building your actual product.
Cloud Run for Processing
Cloud Run is the underrated workhorse of GCP geospatial. Deploy a FastAPI + GDAL container with zero infrastructure management. It auto-scales from 0 to N instances based on request volume, and you pay only for actual request time.
Perfect for tile servers, geocoding APIs, on-demand raster processing, and any geospatial microservice that needs to scale without cluster management.
The pattern is straightforward: a FastAPI application with GeoPandas and rasterio that reads GeoParquet from GCS, performs a spatial operation (intersection, buffer, distance query), and returns GeoJSON. Deployment is a single gcloud command - point it at your source directory, specify the region and memory, and Cloud Run builds, deploys, and serves it with HTTPS, autoscaling, and zero-downtime deployments.
No Kubernetes manifests. No Docker Compose files. No load balancer configuration. For teams that want to deploy geospatial APIs without becoming infrastructure engineers, this is the fastest path. Allocate at least 2GB memory for any container that loads GDAL and rasterio - the cold start on Cloud Run with a full geospatial stack takes 3-5 seconds, faster than Lambda's 8-12 seconds because Cloud Run keeps instances warm for longer.
Public Datasets: GCP's Hidden Weapon
This is the advantage nobody talks about. GCP hosts the largest collection of pre-loaded, pre-indexed, query-ready geospatial datasets on any cloud platform. On AWS or Azure, you need to download, convert, and load them yourself. On GCP, it is one SQL JOIN away.
| DATASET | SIZE | UPDATES | COST |
|---|---|---|---|
| OpenStreetMap (planet) | 1.5TB | Weekly | Free |
| US Census | 100GB | Annual | Free |
| NOAA Weather | 500GB | Daily | Free |
| EPA Facilities | 10GB | Quarterly | Free |
| TIGER (US boundaries) | 50GB | Annual | Free |
The practical impact: enriching your proprietary data with OSM points of interest, census demographics, or weather patterns is a JOIN operation, not an ETL pipeline. No data engineering required. No storage costs for the public data. You pay only for the query compute ($5/TB scanned).
Real Benchmarks
These numbers come from production workloads, not synthetic tests. Each operation was measured with standard GCP pricing in europe-west1.
GCP GEOSPATIAL BENCHMARKS - PRODUCTION WORKLOADS
$0.25 per query
$0.01 per query
Free (research tier)
$0.0000024 per request
$0.30 per run
ingest + spatial query + export, europe-west1
Cost Analysis
A side-by-side comparison for a mid-sized geospatial team (10-30 users, 1TB vector data, regular spatial analysis workloads).
| WORKLOAD | GCP | ESRI ENTERPRISE | POSTGIS (SELF-MANAGED) |
|---|---|---|---|
| Store 1TB vectors | $20/mo (GCS) | ~$200/mo (Enterprise) | $50/mo (RDS) |
| Spatial queries (1TB/mo) | $5 (BigQuery) | Included (licence) | $0 (compute only) |
| Raster analysis | Free (EE research) | $500/mo (Image Server) | DIY (rasterio) |
| API serving | $10/mo (Cloud Run) | $300/mo (GeoEvent) | $50/mo (VM) |
CAVEAT: QUERY-BASED PRICING BITES
BigQuery charges $5 per TB scanned. A single careless SELECT * on a 10TB table costs $50. Always use partitioning, clustering, and column selection to control costs. ESRI's flat-fee model is actually more predictable if you are doing heavy, frequent analysis. GCP is cheaper on average but has a higher variance. Plan accordingly.
When NOT to Use GCP for Geospatial
We run geospatial workloads on GCP daily. We still tell clients not to use it in these scenarios:
1. Your organisation is AWS or Azure first
Multi-cloud complexity rarely justifies the benefits. If your data team, IAM policies, and billing are on AWS, adding GCP for geospatial alone introduces operational overhead that outweighs BigQuery's advantages. Use your primary cloud's spatial capabilities first. Our AWS guide covers the alternatives.
2. You need real-time spatial queries (sub-10ms)
BigQuery has a 1-2 second minimum query time. This is fast for analytics but unacceptable for a spatial API serving a web application. For sub-10ms spatial lookups, use PostGIS with a warm connection pool or Redis with geospatial indexing.
3. You need full control over processing
Earth Engine is a black box. BigQuery abstracts away execution. Cloud Run abstracts away infrastructure. If you need to control every aspect of execution - parallelism, memory allocation, scheduling granularity - consider AWS ECS or bare Kubernetes where you manage the entire stack.
4. You are processing sensitive data
Earth Engine processes data on Google's infrastructure with limited control over data residency. For highly sensitive geospatial data (defence, classified assets, certain financial data), a self-hosted solution with full audit control may be required. BigQuery offers more control here, but Earth Engine does not.
5. Your geospatial workload is small
If you process fewer than 1M records or query less than 100GB/month, PostGIS on a single server is simpler, cheaper, and faster for these workloads. BigQuery's strength is scale. Below a certain threshold, the overhead of cloud-native tooling is not worth the convenience.
Reference Architecture
A production GCP geospatial stack for a team running mixed vector and raster workloads. This is the architecture we deploy for clients who are already on GCP.

1. DATA LAKE
2. SPATIAL ETL
3. ANALYTICS
4. ORCHESTRATION
5. SERVING
The key principle: each service does one thing well. BigQuery for SQL analysis, Dataproc for heavy spatial ETL, Earth Engine for raster imagery, Cloud Run for APIs, Cloud Composer for orchestration, GCS for storage. Resist the temptation to route everything through Earth Engine or build everything as BigQuery stored procedures.
For teams comparing this with other platforms, Part 1 covers Databricks (stronger for lakehouse architectures) and Part 2 covers AWS (more flexible, DIY approach).
Frequently Asked Questions
Can BigQuery handle geospatial data?
Yes. BigQuery has a native GEOGRAPHY type with spatial functions (ST_CONTAINS, ST_DISTANCE, ST_INTERSECTION, and more). It handles spatial queries on billion-row datasets in seconds using automatic S2 cell indexing. No extensions or plugins needed.
Is Google Earth Engine free?
Earth Engine is free for academic and research use. Commercial use requires a paid licence, with pricing that depends on usage volume. For many commercial geospatial workloads, BigQuery GIS combined with Cloud Run is more cost-effective and predictable.
What is the best GCP service for geospatial analysis?
It depends on data type. For vector analysis at scale, BigQuery GIS is the best choice. For satellite imagery and multi-temporal raster analysis, Earth Engine is unmatched. For custom processing APIs, Cloud Run with GeoPandas/rasterio is the most flexible option.
GCP has the most powerful individual geospatial tools of any cloud platform. The challenge is knowing which tool to reach for.
BigQuery GIS for vector analysis at scale. Earth Engine for satellite imagery - and nothing else. Cloud Run for APIs without infrastructure overhead. The teams that get GCP geospatial right are the ones that resist the temptation to use one service for everything.
That is the consistent pattern across this entire series. Match the tool to the problem. Not the other way around.
Get Workflow Automation Insights
Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.
