Skip to main content
BACK TO SERIESGEOSPATIAL IN CLOUD - PART 4
Cloud Platform

Geospatial in Cloud:
Snowflake

19 native H3 functions. 60+ spatial SQL functions. Zero cluster management. When Snowflake is the right choice for geospatial - and when it is not.

15 MIN READMARCH 2026PART 4 OF 4
  • 19 native H3 functions in pure SQL - no library imports, no cluster configuration. H3-accelerated spatial joins on 100M+ records drop from minutes to seconds by converting expensive geometry operations into integer equality joins
  • GEOGRAPHY (WGS 84 sphere) and GEOMETRY (planar, any SRID) types with 60-70 native spatial functions. Covers the 80% case for spatial analytics - but no raster, no 3D, no topology
  • Three spatial indexing approaches: automatic micro-partition pruning, H3 clustering (recommended for large tables), and Search Optimisation Service (Enterprise only, 10x credit rate - estimate costs before enabling)
  • Snowpark Python UDFs support GeoPandas, Shapely, Fiona, and PyPROJ inside Snowflake's sandbox. Use Snowpark-optimised warehouses (256 GB memory) for heavy spatial operations - standard warehouses will OOM on complex geometries
  • Zero-copy data sharing via Snowflake Marketplace is the differentiator no other platform matches. CARTO, Precisely, and Foursquare provide geospatial datasets you can join without moving data
  • Typical moderate geospatial workload costs $150-300/month in Snowflake credits. A weekly spatial join on 50M features costs roughly $16/month on a Large warehouse vs $7,000/year for an ArcGIS Advanced licence

Geospatial in Cloud Series

This is Part 4 of our Geospatial in Cloud series. Each post is self-contained. Part 1 covers Databricks. Part 2 covers AWS. Part 3 covers GCP. Read the one that matches your stack.

Snowflake is the platform that teams pick when they want spatial analytics without infrastructure overhead. No clusters to configure, no libraries to install, no Spark context to debug. You write SELECT ST_DISTANCE(a.geom, b.geom) and it works.

That simplicity is real. But so are the trade-offs. Snowflake has no raster support, no 3D geometry, and its spatial function count is roughly a sixth of PostGIS. If your workflow involves satellite imagery, terrain models, or complex topology operations, this is not your platform.

For everything else - point-in-polygon joins, H3 aggregation, exposure analysis, fleet analytics, location intelligence - Snowflake delivers with less operational friction than any other cloud data platform we have worked with.

Why Snowflake for Geospatial

The pitch is simple: if your team thinks in SQL and your data is already in Snowflake, adding geospatial is close to zero effort. No new infrastructure, no new skills, no new budgets to approve.

Zero cluster management

Scale from XS to 6XL warehouse with one command. Auto-suspend after 60 seconds of idle. No Spark configuration, no driver/executor tuning.

Native H3 in pure SQL

19 built-in H3 functions. Convert a spatial join on 100M records from minutes to seconds by turning geometry operations into integer equality joins.

Zero-copy data sharing

Join your data with CARTO, Precisely, or Foursquare datasets from the Snowflake Marketplace without moving a single byte. No other platform matches this.

SQL your team already knows

No Python required for spatial analytics. Your SQL analysts can write spatial queries on day one. The learning curve is a handful of ST_* functions.

GEOGRAPHY vs GEOMETRY

Snowflake provides two spatial data types. Choosing the wrong one is one of the most common mistakes we see teams make in their first week.

GEOGRAPHY models the earth as a sphere using WGS 84 (SRID 4326). Coordinates are longitude/latitude degrees. Distance calculations return metres on the spherical surface. This is the type most GIS engineers reach for first - it maps directly to the lon/lat paradigm from QGIS and ArcGIS.

GEOMETRY uses a planar Cartesian coordinate system. It supports arbitrary SRIDs - EPSG:27700 for British National Grid, EPSG:32632 for UTM Zone 32N, whatever your data uses. Calculations are Euclidean. Use this when your data lives in a projected coordinate system and you need metre-accurate distances without spherical distortion.

Critical limitations to know

  • Only 2D coordinates supported - no Z (elevation) or M (measure) values
  • You cannot cast between GEOGRAPHY and GEOMETRY - they are separate worlds
  • GEOGRAPHY only supports WGS 84 - no other CRS
  • Column size limit is 64 MB per value (increased from 8 MB in 2025)
  • GEOGRAPHY uses a perfect sphere model, not an ellipsoid - slight accuracy differences vs PostGIS for geodetic work

Rule of thumb: if your data is in lon/lat and you are doing analytics (distance, containment, aggregation), use GEOGRAPHY. If your data is in a national or UTM projection and you need precise planar geometry operations, use GEOMETRY. If you need to convert between projections, use ST_TRANSFORM - but only on GEOMETRY columns.

Spatial SQL Functions

Snowflake provides roughly 60-70 native spatial functions. For context, PostGIS has 400+. The gap is real but often irrelevant: the 60 functions Snowflake has cover the core operations that 80% of geospatial SQL workflows need.

The essentials are all present: ST_DISTANCE, ST_INTERSECTS, ST_CONTAINS, ST_BUFFER, ST_UNION, ST_INTERSECTION, ST_SIMPLIFY, ST_CENTROID, ST_AREA. You can run spatial joins, create buffers, dissolve polygons, calculate distances, and do containment checks.

Two aggregate functions handle the bulk of GIS dissolve operations: ST_COLLECT combines rows into a GeometryCollection, and ST_UNION_AGG dissolves all geometries in a group - the SQL equivalent of ArcGIS Dissolve.

Functions you will miss (coming from PostGIS)

  • No ST_Voronoi / ST_DelaunayTriangles
  • No ST_ClusterDBSCAN / ST_ClusterKMeans (use Python UDFs)
  • No ST_MakeValid natively (available via Sedona Native App)
  • No linear referencing (ST_LineLocatePoint, ST_LineInterpolatePoint)
  • No topology operations, no routing, no network analysis
  • No raster functions at all

The gap filler is the Apache Sedona Native App (SedonaSnow), available on the Snowflake Marketplace. It roughly doubles the available function count and adds ST_MakeValid, ST_SubDivide, broader CRS support via ST_Transform, and more. Worth evaluating if you hit the native function ceiling.

H3: The Killer Feature

H3 is Uber's hexagonal spatial indexing system, and Snowflake's native integration is genuinely excellent. 19 functions, all first-class SQL citizens. No library imports, no UDF wrappers, no setup.

The core insight is this: traditional spatial joins ( ST_INTERSECTS on two large tables) are computationally expensive. Every row on the left must be tested against potentially every row on the right. With H3, you convert both datasets to the same hexagonal resolution, then join on integer equality. An O(n*m) geometry operation becomes a hash join. On 100M+ records, this can take spatial joins from minutes to seconds.

The key functions: H3_LATLNG_TO_CELL(lat, lng, resolution) converts a point to an H3 cell index. H3_COVERAGE(geography, resolution) covers a polygon with the minimal set of H3 cells. H3_CELL_TO_BOUNDARY(cell) converts back to a hexagonal polygon for visualisation. H3_CELL_TO_PARENT and H3_CELL_TO_CHILDREN navigate the hierarchy for multi-resolution analysis.

H3 RESOLUTION GUIDE

Resolution 3

~12,400 km2 - Country-level

Resolution 5

~253 km2 - Regional analysis

Resolution 7

~5.16 km2 - City / catchment areas

Resolution 9

~0.105 km2 - Neighbourhood / postcode

Resolution 11

~0.002 km2 - Building level

Resolution 15

~0.9 m2 - Sub-metre precision

The typical pattern: add an H3 column to your table, cluster by it, then join on integer equality. For a table of insurance claims joined to flood zones, resolution 9 gives neighbourhood-level precision. The join becomes a GROUP BY on an integer column - exactly what Snowflake's columnar engine is optimised for.

Comparison: Databricks has H3 natively since DBR 17+ and via Mosaic on earlier runtimes. BigQuery has no built-in H3 - it relies on S2 as its native spatial index. For teams that want H3 in pure SQL without Spark complexity, Snowflake is the simplest path.

Watch the cell count

Running H3_COVERAGE at resolution 12+ on country-sized polygons generates billions of cells. Start at resolution 7-9 and only increase if your analysis requires it. A medium warehouse can handle resolution 9 on most workloads. Resolution 12+ on large polygons needs an XL or larger.

Loading Geospatial Data

Snowflake natively understands GeoJSON, WKT, WKB, EWKT, and EWKB. You can load these from staged files (internal or external S3/GCS/Azure stages) via COPY INTO with a TO_GEOGRAPHY() or TO_GEOMETRY() cast.

GeoParquet is the recommended format. It stores geometry as WKB in a binary column, compresses well, and is becoming the interchange standard across Snowflake, Databricks, DuckDB, and BigQuery. Iceberg table support for GEOGRAPHY and GEOMETRY types was announced at Summit 2025 - when GA, this makes GeoParquet round-tripping straightforward across platforms.

Shapefiles are not natively supported. This is friction for every team migrating from ArcGIS. The recommended workflow: convert to GeoParquet locally using ogr2ogr or GeoPandas, upload to a Snowflake stage, then COPY INTO with WKB parsing. Alternatively, use FME's native Snowflake Spatial writer if your organisation already has a licence.

DATA LOADING SUPPORT

GeoJSONVia JSON file format + TO_GEOGRAPHY()
GeoParquetWKB column in Parquet - recommended format
WKT / WKBFrom CSV, Parquet, or direct SQL
ShapefileMust pre-convert to GeoJSON or GeoParquet
GeoPackageMust pre-convert
KML / KMZMust pre-convert
GeoTIFFRaster - not supported
File GeodatabaseMust pre-convert via ogr2ogr or FME

One trap to watch: GeoJSON FeatureCollections need to be flattened. Each Feature becomes a row via LATERAL FLATTEN. The GEOGRAPHY type rejects any SRID other than 4326 - if your GeoJSON has a different CRS, you need GEOMETRY instead.

Spatial Indexing

There is no CREATE SPATIAL INDEX in Snowflake. If you come from PostGIS, this feels wrong. But Snowflake has three mechanisms that achieve similar outcomes through different means.

1. Automatic micro-partition pruning

Snowflake stores bounding-box metadata for every micro-partition. When you filter with spatial predicates, it skips partitions whose bounding boxes do not overlap. This is automatic and free. For well-clustered data, it provides significant speedup. For randomly distributed data, it helps less.

2. H3 clustering (recommended)

Pre-compute an H3 index column and cluster the table by it. This ensures spatially proximate data lives in the same micro-partitions, making bounding-box pruning highly effective. This is the cheapest and most impactful optimisation for large spatial tables. Use ALTER TABLE ... CLUSTER BY (h3_column).

3. Search Optimisation Service (Enterprise only)

Creates persistent search access paths for spatial predicates. Enable per-column with ALTER TABLE ... ADD SEARCH OPTIMIZATION ON GEO(column). Supports ST_INTERSECTS, ST_CONTAINS, ST_WITHIN, ST_DWITHIN.

Cost warning: SOS uses serverless compute charged at 10x the normal credit rate. Run SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS before enabling. On frequently-updated tables, this can add $150-600/month.

For most teams, option 2 (H3 clustering) is the right starting point. It costs only the standard auto-reclustering credits and provides the biggest performance improvement per dollar. Only add SOS if H3 clustering is not sufficient and you have confirmed the cost is acceptable.

Python + GeoPandas Integration

Snowpark Python is the bridge between SQL and Python inside Snowflake. Python UDFs run in a secure sandbox with access to packages from Snowflake's Anaconda channel. The geospatial packages that matter are available: Shapely, GeoPandas, Fiona, PyPROJ, and GDAL.

The primary use case: operations that Snowflake's native SQL cannot handle. Spatial clustering with DBSCAN, complex geometry validation, coordinate transformations across CRS not supported natively, or reading Shapefiles directly from a stage using Fiona.

Performance traps with Python UDFs

  • Cold start: ~400ms on small warehouses, ~900ms on medium. First call in a session is noticeably slower
  • Memory limits: Standard warehouses provide ~16 GB per node. For heavy GeoPandas operations, use Snowpark-optimised warehouses (256 GB per node)
  • Rasterio: Not reliably available in the Anaconda channel. Raster processing in Snowflake UDFs is essentially not viable
  • Debugging: Python UDF errors surface as generic SQL errors. Add explicit try/except with descriptive messages in your UDFs

Snowflake Notebooks (Streamlit-based) let you run GeoPandas interactively inside Snowflake with Pydeck or Folium for visualisation. For exploratory spatial analysis, this is the fastest path. For production pipelines, wrap your logic in stored procedures or UDTFs.

Streamlit in Snowflake deserves mention: you can build interactive geospatial web apps (maps, dashboards) that run entirely inside your Snowflake account. Data never leaves the platform, access control inherits from Snowflake's RBAC, and there is no infrastructure to manage. For internal tools, this is a compelling alternative to standing up a separate mapping application.

Snowflake vs Databricks for Geospatial

Both are focus platforms for us. This is a direct comparison based on migrating production workloads to both.

FACTORSNOWFLAKEDATABRICKS
Primary languageSQL (Python via UDFs)Python/Spark (SQL via Spark SQL)
H3 support19 native SQL functionsNative (DBR 17+) + Mosaic library
Spatial functions60-70 native90+ native + Sedona + Mosaic
Raster supportNoneVia Mosaic / Sedona
3D geometryNoneLimited
Cluster managementNone (serverless)Required (Spark config)
Data sharingZero-copy (Marketplace)Delta Sharing (open protocol)
Scale ceilingExcellent to ~500M rowsSuperior beyond 500M (distributed Spark)
ML on spatialCortex ML (basic) + Python UDFsMLflow + Spark ML + Mosaic
Cost modelCredits (simpler)DBUs + cloud infra (harder to predict)

Choose Snowflake when: your team is SQL-first, your data is already in Snowflake, you need zero-copy sharing with marketplace data providers, and your workload is vector analytics without raster requirements.

Choose Databricks when: your team is Python/Spark-first, you need raster support, you are building ML pipelines on spatial features, or you are processing at extreme scale (billions of rows with complex geometry operations).

For the reinsurance use case specifically - join 50M parcels to flood zones and aggregate claims by hexagon - both platforms handle this. Snowflake is simpler to set up and operate. Databricks is more powerful for the next step: training spatial risk models on the results.

Cost Model

Snowflake charges standard compute credits for all spatial operations - no separate geospatial pricing. You pay per second of warehouse time (1-minute minimum on resume), plus storage.

WORKLOADWAREHOUSEDURATION~COST (ENTERPRISE)
Spatial join, 10M x 10KMedium~30 sec~$0.09
H3 polyfill, 1M polygons at res 9Large~2 min~$0.81
Weekly flood risk analysis, 50M parcelsLarge~15 min~$4/run
Daily point aggregation, 10M eventsMedium~5 min~$0.67/run
Geo Search Optimisation (monthly)Serverless (10x)Ongoing$150-600/mo

A moderate geospatial analytics workload - weekly batch jobs, daily aggregations, ad-hoc analyst queries - lands in the $150-300/month range. Compare that to an ArcGIS Advanced licence at ~$7,000/year for equivalent desktop processing. The break-even point favours Snowflake for any workload that runs more than a few times per month.

Hidden costs to watch

  • Search Optimisation Service at 10x credit rate on large, frequently-updated tables
  • Always-on warehouses - set auto-suspend to 60 seconds for ad-hoc spatial queries
  • H3 coverage at resolution 12+ on country-sized polygons generates billions of cells
  • Snowpark-optimised warehouses cost 1.7x standard warehouses
  • Enterprise Edition required for Geo Search Optimisation - Standard Edition lacks this

When NOT to Use Snowflake for Geospatial

Every platform has boundaries. Here are Snowflake's.

Raster or imagery processing

No GeoTIFF, no DEMs, no satellite imagery, no raster algebra. If your workflow touches gridded data, use Databricks (Mosaic), Google Earth Engine, or a dedicated raster pipeline.

3D geometry or LiDAR

Only 2D coordinates supported. No Z values, no M values. Terrain analysis, 3D building models, and point clouds require a different tool.

Sub-100ms spatial lookups

Snowflake is an OLAP platform. If you need a live map API with real-time spatial queries, you still need PostGIS or a specialised tile server.

Complex topology operations

No network analysis, no routing, no linear referencing, no Voronoi diagrams natively. PostGIS or Shapely via Python UDFs can fill some gaps, but if topology is central to your workflow, Snowflake is the wrong tool.

Heavy CRS transformations

GEOGRAPHY is locked to WGS 84. GEOMETRY supports SRIDs but the built-in CRS catalogue is limited compared to PostGIS. If your workflow requires frequent reprojection between obscure coordinate systems, expect friction.

Budget-constrained teams without Enterprise

The most impactful performance feature (Geo Search Optimisation) requires Enterprise Edition. On Standard Edition, you rely on H3 clustering and micro-partition pruning - still effective, but the ceiling is lower.

Getting Started

If you are already on Snowflake, enabling geospatial is immediate. No packages to install, no extensions to enable. The spatial functions and H3 functions are available in every warehouse.

1

Convert your Shapefiles to GeoParquet

Use ogr2ogr or GeoPandas locally. This is the one step you cannot skip - Snowflake does not read Shapefiles natively.

2

Stage and load

PUT the files to a stage, COPY INTO with TO_GEOGRAPHY() or TO_GEOMETRY(). Validate with ST_ISVALID.

3

Add H3 indexes

Add an H3 column at resolution 7-9, then cluster by it. This single step gives you the biggest performance improvement.

4

Start with SQL

ST_INTERSECTS, ST_DISTANCE, ST_AREA, GROUP BY H3 cell. Your SQL analysts can start querying spatial data immediately.

5

Add Python when needed

Only reach for Snowpark Python UDFs when native SQL cannot do the job - spatial clustering, complex validation, format conversion.

6

Explore the Marketplace

Check the Snowflake Marketplace for CARTO Analytics Toolbox (70+ extra spatial functions) and third-party geospatial datasets (flood zones, demographics, POI).

Snowflake is not trying to be PostGIS. It is trying to make spatial analytics accessible to every SQL analyst who already has a Snowflake account.

For teams that need point-in-polygon joins, H3 aggregation, exposure analysis, and location intelligence - all in SQL, with no infrastructure to manage - it delivers. The 19 native H3 functions alone are worth the evaluation. The zero-copy data sharing is something no other platform offers.

Know its boundaries. No raster. No 3D. No sub-millisecond lookups. If your workflow fits within those boundaries, Snowflake is one of the simplest paths from legacy desktop GIS to cloud-native spatial analytics.

Get Workflow Automation Insights

Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.

SKIP THE MANUAL MIGRATION

Everything in This Guide - Automated

Shapefile conversion, GeoParquet loading, H3 indexing, spatial SQL translation - our platform handles the full migration from legacy GIS to Snowflake. Upload your ArcPy scripts or QGIS projects, select Snowflake as your target, and our AI agents generate production-ready pipelines. No weeks of manual data conversion. No trial-and-error with spatial function mapping.

READY TO MOVE TO CLOUD

Free Geospatial Cloud Assessment

We analyse your current workflows, data volumes, and team skills to recommend the right cloud platform - whether that is Snowflake, Databricks, or a hybrid approach.

  • Platform recommendation (Snowflake / Databricks / hybrid)
  • Cost projection: Year 1 vs Year 3
  • Migration effort estimate