You have 47 ArcPy scripts processing flood risk assessments every week. They take 3-4 hours to run. Your team knows arcpy.da.UpdateCursor by heart. The licensing costs GBP15,000 annually for Spatial Analyst extensions alone.
A colleague suggests GeoPandas. “It's open source,” they say. “Much faster.” You search Stack Overflow and find enthusiastic posts about 10x speed improvements. But you also see warnings about memory errors, missing topology tools, and incomplete translations. Before diving into the technical details, you might want to understand whether your current ArcGIS licenses are even being used efficiently.
This guide cuts through the noise. We've migrated 200+ ArcPy scripts to GeoPandas across insurance, utilities, and government clients. Here's what actually works, what doesn't, and how to decide whether migration makes sense for your organisation.

The migration path: from complexity to clarity. Not a replacement—a transformation.
When NOT to Switch to GeoPandas
Let's start with honesty: GeoPandas isn't a drop-in replacement for ArcPy. If you rely on specific Esri capabilities, migration may create more problems than it solves.
ESRI TOPOLOGY RULES
Example: Utility network validation rules. “Pipes must not overlap,” “Valves must connect to exactly two pipes,” “Service areas must not have gaps.”
ArcPy's topology framework provides declarative rule definition and batch validation. GeoPandas has no equivalent. You can write custom validation logic using Shapely predicates, but it's procedural code, not declarative rules.
Verdict: If topology rules are core to your workflow, keep ArcPy for validation. Use GeoPandas for data preparation and analysis.
NETWORK ANALYST EXTENSION
Example: Routing emergency vehicles, service area analysis, vehicle routing problem with time windows.
ArcPy's Network Analyst is a sophisticated solver. Open-source alternatives exist (NetworkX, OSMnx, pgRouting), but they require different workflows and don't support all Esri network dataset features.
Verdict: Complex routing problems may justify ArcPy licensing. Simple routing can use OSMnx.
The Hybrid Architecture We Use
When clients need specific ArcPy capabilities but want GeoPandas performance for 90% of operations, we use this architecture:
Data Preparation: GeoPandas
Load, filter, transform, spatial joins—everything that's fast in GeoPandas.
Specialised Operations: ArcPy
Export to File Geodatabase, run topology validation or network routing, export results.
Post-Processing: GeoPandas
Load ArcPy results, join with other datasets, generate reports, export to cloud storage.
import geopandas as gpd
import arcpy
# Fast data prep in GeoPandas
parcels = gpd.read_parquet("s3://bucket/parcels.parquet")
filtered = parcels[parcels['zone'] == 'RESIDENTIAL']
# Export to FGDB for ArcPy topology check
filtered.to_file("temp.gdb", layer="parcels", driver="OpenFileGDB")
# Run Esri topology validation
arcpy.ValidateTopology_management("temp.gdb/topology")
errors = arcpy.da.SearchCursor("temp.gdb/topology_errors", ["SHAPE@", "RuleType"])
# Back to GeoPandas for reporting
error_gdf = gpd.GeoDataFrame.from_features([...])
error_gdf.to_parquet("topology_errors.parquet")Result: 10x speed improvement on data processing, while retaining Esri-specific capabilities where needed. Licensing costs reduced by 70% (only one ArcGIS Pro licence for the validation server). For more on GeoParquet and other cloud-native formats, see our guide to COG, GeoParquet, and STAC.
How Fast Is GeoPandas Compared to ArcPy?
GeoPandas is 75x to 287x faster than ArcPy for common operations like spatial joins, buffering, and dissolves. The speed comes from vectorised operations and in-memory processing, compared to ArcPy's cursor-based iteration and file system locks. Here are three workflows we've migrated, with actual numbers.

How Do You Handle Large Datasets in GeoPandas?
Use chunking, Dask-GeoPandas, or DuckDB Spatial to process datasets larger than available RAM. GeoPandas loads everything into memory by default, which can crash on large datasets. The solution is to either process in spatial chunks, use Dask-GeoPandas for parallel out-of-core processing, or query directly with DuckDB Spatial. Here's the problem nobody mentions in “GeoPandas is faster” blog posts: GeoPandas loads everything into memory. ArcPy streams data through cursors. This fundamental difference means GeoPandas can fail spectacularly on datasets that ArcPy handles without complaint.
THE MEMORY WALL
Example: National parcel dataset, 50 million features, 35GB on disk. ArcPy processes this with UpdateCursor using 2GB RAM. GeoPandas tries to load the entire GeoDataFrame into memory—crashes with MemoryError on a 32GB machine.
This isn't a bug. It's architectural. Pandas (and therefore GeoPandas) is designed for in-memory analytics. When data exceeds available RAM, you need different strategies.
Process data in spatial or attribute-based chunks. Works for operations that don't require cross-chunk analysis (filtering, attribute calculation, projection).
import geopandas as gpd
# Process by county to keep chunks manageable
counties = gpd.read_file("counties.shp")
results = []
for idx, county in counties.iterrows():
# Load only parcels in this county
parcels = gpd.read_file(
"national_parcels.gpkg",
mask=county.geometry, # Spatial filter
engine="pyogrio" # Fast driver
)
# Process chunk
parcels['area_m2'] = parcels.geometry.area
parcels['density'] = parcels['population'] / parcels['area_m2']
results.append(parcels)
# Combine results
final = gpd.GeoDataFrame(pd.concat(results, ignore_index=True))Dask-GeoPandas partitions data across multiple cores and can spill to disk when memory fills. Supports most GeoPandas operations with parallel execution.
import dask_geopandas as dgpd
# Read with Dask (lazy evaluation)
ddf = dgpd.read_parquet(
"parcels.parquet",
npartitions=32 # Split into 32 chunks
)
# Operations are lazy until compute()
ddf['area_m2'] = ddf.geometry.area
ddf['value_per_m2'] = ddf['assessed_value'] / ddf['area_m2']
# Trigger computation with parallel execution
result = ddf.compute() # Uses all CPU cores
# Or save directly without loading full result
ddf.to_parquet("processed_parcels.parquet")For pure analytical queries (no complex geometry operations), DuckDB Spatial provides SQL interface with excellent performance on large files.
import duckdb
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
# Query 50M parcels without loading into memory
result = con.execute("""
SELECT
county,
COUNT(*) as parcel_count,
AVG(ST_Area(geometry)) as avg_area_m2,
SUM(assessed_value) as total_value
FROM read_parquet('parcels.parquet')
WHERE land_use = 'RESIDENTIAL'
GROUP BY county
""").df()
print(result)Process data in spatial or attribute-based chunks. Works for operations that don't require cross-chunk analysis (filtering, attribute calculation, projection).
import geopandas as gpd
# Process by county to keep chunks manageable
counties = gpd.read_file("counties.shp")
results = []
for idx, county in counties.iterrows():
# Load only parcels in this county
parcels = gpd.read_file(
"national_parcels.gpkg",
mask=county.geometry, # Spatial filter
engine="pyogrio" # Fast driver
)
# Process chunk
parcels['area_m2'] = parcels.geometry.area
parcels['density'] = parcels['population'] / parcels['area_m2']
results.append(parcels)
# Combine results
final = gpd.GeoDataFrame(pd.concat(results, ignore_index=True))Dask-GeoPandas partitions data across multiple cores and can spill to disk when memory fills. Supports most GeoPandas operations with parallel execution.
import dask_geopandas as dgpd
# Read with Dask (lazy evaluation)
ddf = dgpd.read_parquet(
"parcels.parquet",
npartitions=32 # Split into 32 chunks
)
# Operations are lazy until compute()
ddf['area_m2'] = ddf.geometry.area
ddf['value_per_m2'] = ddf['assessed_value'] / ddf['area_m2']
# Trigger computation with parallel execution
result = ddf.compute() # Uses all CPU cores
# Or save directly without loading full result
ddf.to_parquet("processed_parcels.parquet")For pure analytical queries (no complex geometry operations), DuckDB Spatial provides SQL interface with excellent performance on large files.
import duckdb
con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")
# Query 50M parcels without loading into memory
result = con.execute("""
SELECT
county,
COUNT(*) as parcel_count,
AVG(ST_Area(geometry)) as avg_area_m2,
SUM(assessed_value) as total_value
FROM read_parquet('parcels.parquet')
WHERE land_use = 'RESIDENTIAL'
GROUP BY county
""").df()
print(result)| Dataset Size | Operation Type | Recommended Approach |
|---|---|---|
| < 1M features | Any | Standard GeoPandas |
| 1-10M features | Geometry operations | Dask-GeoPandas |
| 1-10M features | Analytical queries | DuckDB Spatial |
| 10-50M features | Complex spatial | Dask-GeoPandas + chunking |
| > 50M features | Any | PostGIS or BigQuery GIS |

How Do You Translate ArcPy Code to GeoPandas?
Replace ArcPy cursors with GeoPandas DataFrames, geoprocessing tools with vectorised methods, and file geodatabases with GeoParquet or GeoPackage. Most ArcPy operations have direct GeoPandas equivalents: Buffer_analysis becomes .buffer(), SpatialJoin_analysis becomes gpd.sjoin(), Dissolve_management becomes .dissolve(). Here's the translation table for the 20 most common patterns.
PATTERN: READ DATA
cursor = arcpy.da.SearchCursor("parcels.shp", ["SHAPE@", "VALUE", "ZONE"])
for row in cursor:
geometry = row[0]
value = row[1]
zone = row[2]Cursor-based iteration
PATTERN: READ DATA
gdf = gpd.read_file("parcels.shp")
# Vectorised access (no loop needed)
areas = gdf.geometry.area
high_value = gdf[gdf['VALUE'] > 100000]Vectorised operations
PATTERN: BUFFER
arcpy.Buffer_analysis("roads.shp", "roads_buffered.shp", "50 METERS")PATTERN: BUFFER
roads = gpd.read_file("roads.shp")
roads_buffered = roads.copy()
roads_buffered['geometry'] = roads.geometry.buffer(50)
roads_buffered.to_file("roads_buffered.shp")PATTERN: SPATIAL JOIN
arcpy.SpatialJoin_analysis(
"parcels.shp",
"flood_zones.shp",
"parcels_flood_risk.shp",
"JOIN_ONE_TO_ONE",
"KEEP_ALL",
match_option="INTERSECT"
)PATTERN: SPATIAL JOIN
parcels = gpd.read_file("parcels.shp")
flood_zones = gpd.read_file("flood_zones.shp")
result = gpd.sjoin(
parcels,
flood_zones,
how="left", # KEEP_ALL
predicate="intersects" # INTERSECT
)
result.to_file("parcels_flood_risk.shp")PATTERN: DISSOLVE
arcpy.Dissolve_management(
"parcels.shp",
"parcels_by_zone.shp",
"ZONE",
[["VALUE", "SUM"], ["AREA", "SUM"]]
)PATTERN: DISSOLVE
parcels = gpd.read_file("parcels.shp")
dissolved = parcels.dissolve(
by='ZONE',
aggfunc={'VALUE': 'sum', 'AREA': 'sum'}
)
dissolved.to_file("parcels_by_zone.shp")| Operation | ArcPy | GeoPandas |
|---|---|---|
| Clip to boundary | Clip_analysis | gpd.clip(gdf, mask) |
| Reproject | Project_management | gdf.to_crs(epsg=4326) |
| Select by attribute | Select_analysis | gdf[gdf['field'] > 10] |
| Calculate area | CalculateField + SHAPE@AREA | gdf.geometry.area |
| Centroids | FeatureToPoint | gdf.geometry.centroid |
| Intersection | Intersect_analysis | gpd.overlay(gdf1, gdf2, 'intersection') |
| Union | Union_analysis | gpd.overlay(gdf1, gdf2, 'union') |
| Merge datasets | Merge_management | pd.concat([gdf1, gdf2]) |

What Replaces ArcPy Spatial Analyst for Raster Processing?
Rasterio handles raster I/O, while NumPy and xarray perform raster algebra. This combination replaces arcpy.sa (Spatial Analyst). It's more explicit than Spatial Analyst's Map Algebra, but offers better performance, native cloud integration (COG, S3), and works without ArcGIS licensing. For raster operations, Rasterio is the GeoPandas equivalent. It's lower-level than Spatial Analyst (more explicit NumPy operations), but offers better performance and cloud integration.
PATTERN: READ RASTER, APPLY CALCULATION
from arcpy.sa import *
dem = Raster("dem.tif")
slope = Slope(dem, "DEGREE")
slope.save("slope.tif")RASTERIO + NUMPY
import rasterio
import numpy as np
from rasterio.transform import Affine
with rasterio.open("dem.tif") as src:
dem = src.read(1)
transform = src.transform
# Calculate slope using gradient
dy, dx = np.gradient(dem, transform[0])
slope = np.degrees(np.arctan(np.sqrt(dx**2 + dy**2)))
# Write output
profile = src.profile
with rasterio.open("slope.tif", "w", **profile) as dst:
dst.write(slope, 1)PATTERN: EXTRACT RASTER VALUES TO POINTS
from arcpy.sa import ExtractValuesToPoints
ExtractValuesToPoints(
"points.shp",
"elevation.tif",
"points_with_elev.shp"
)RASTERIO + GEOPANDAS
import geopandas as gpd
import rasterio
points = gpd.read_file("points.shp")
with rasterio.open("elevation.tif") as src:
coords = [(p.x, p.y) for p in points.geometry]
points['elevation'] = [v[0] for v in src.sample(coords)]
points.to_file("points_with_elev.shp")CLOUD-NATIVE RASTERS: COG
Rasterio reads Cloud-Optimized GeoTIFFs (COG) directly from S3/Azure without downloading the full file. ArcPy requires local file access or slow streaming.
with rasterio.open("s3://bucket/elevation.tif") as src:
# Read only a 1km² window (fast range request)
window = src.window(xmin, ymin, xmax, ymax)
data = src.read(1, window=window)Performance: Reading 1km squared from 50GB raster: ArcPy 14 minutes (download full file) to Rasterio 2.3 seconds (range request).
How Long Does It Take to Migrate from ArcPy to GeoPandas?
A typical migration takes 3-6 months for a full codebase, starting with a 2-week pilot. The timeline depends on script complexity, Esri-specific dependencies, and team experience with Python. Simple scripts (data loading, filtering, exports) migrate in hours. Complex workflows with topology or network analysis may need hybrid architectures. We've migrated 200+ ArcPy scripts. Here's the systematic approach that works, with specific actions at each phase. If your team needs to build Python skills first, see our guide to training GIS teams for workflow automation.
- Inventory all ArcPy scripts: list file paths, what they do, how often they run
- Measure current performance: run time, memory usage, failure rate
- Identify dependencies: which scripts use Network Analyst, Topology, or other Esri-specific tools?
- Calculate licensing costs: ArcGIS Pro licences, Spatial Analyst, Network Analyst extensions
- Prioritise by ROI: migrate slow, frequently-run scripts without Esri dependencies first
- Select pilot: moderate complexity, no Esri-specific dependencies, measurable performance
- Translate using patterns above: test each operation with production data
- Benchmark rigorously: run both versions 5 times, measure mean/std deviation
- Validate outputs: geometry checks (ST_Equals), attribute comparison, visual inspection
- Document translation: which ArcPy functions map to which GeoPandas patterns
- Run both ArcPy and GeoPandas versions in production simultaneously
- Alert on output divergence: automated geometry and attribute comparison
- Monitor memory usage: identify scripts that need Dask or chunking
- Train team: pair programming sessions, code review, documentation
- Build internal library: reusable functions for common operations
- Migrate remaining scripts systematically (priority order from audit)
- Implement hybrid architecture for Esri-dependent workflows
- Create monitoring dashboard: script run times, success rates, memory usage
- Decommission ArcPy versions after 30-day confidence period
- Reduce licensing: cancel unused Esri licences, document savings
Complete Workflow Translation: Real Example
Here's a production workflow we migrated for a global reinsurer: identify parcels in flood zones, calculate risk scores, export for underwriting review.
ORIGINAL ARCPY VERSION (47 MINUTES)
import arcpy
import os
# Setup
arcpy.env.workspace = "C:/data/flood_risk.gdb"
arcpy.env.overwriteOutput = True
# Read parcels and flood zones
parcels = "parcels"
flood_zones = "FEMA_flood_zones"
# Buffer flood zones by 50m for transition zone
print("Buffering flood zones...")
arcpy.Buffer_analysis(flood_zones, "flood_buffered", "50 METERS")
# Spatial join to find at-risk parcels
print("Identifying at-risk parcels...")
arcpy.SpatialJoin_analysis(
parcels,
"flood_buffered",
"parcels_at_risk",
"JOIN_ONE_TO_ONE",
"KEEP_ALL",
match_option="INTERSECT"
)
# Calculate risk score
print("Calculating risk scores...")
arcpy.AddField_management("parcels_at_risk", "RISK_SCORE", "DOUBLE")
arcpy.AddField_management("parcels_at_risk", "RISK_CATEGORY", "TEXT")
cursor = arcpy.da.UpdateCursor(
"parcels_at_risk",
["SHAPE@AREA", "ASSESSED_VALUE", "FLOOD_ZONE", "RISK_SCORE", "RISK_CATEGORY"]
)
for row in cursor:
area = row[0]
value = row[1]
zone = row[2]
# Risk calculation
if zone == "A": # High risk
risk = (value / area) * 1.5
category = "HIGH"
elif zone == "X": # Moderate
risk = (value / area) * 0.8
category = "MODERATE"
else:
risk = 0
category = "LOW"
row[3] = risk
row[4] = category
cursor.updateRow(row)
del cursor
# Export to Excel for underwriting
print("Exporting results...")
arcpy.conversion.TableToExcel("parcels_at_risk", "C:/output/flood_risk_report.xlsx")
print("Complete!")Runtime: 47 minutes | Memory: 2.1GB peak | 10,000 parcels, 250 flood zones
MIGRATED GEOPANDAS VERSION (38 SECONDS)
import geopandas as gpd
import pandas as pd
# Read data (GeoParquet is 10x faster than FGDB)
parcels = gpd.read_parquet("s3://bucket/parcels.parquet")
flood_zones = gpd.read_parquet("s3://bucket/flood_zones.parquet")
# Buffer flood zones (vectorised operation)
flood_buffered = flood_zones.copy()
flood_buffered['geometry'] = flood_zones.geometry.buffer(50)
# Spatial join (uses spatial index automatically)
at_risk = gpd.sjoin(
parcels,
flood_buffered[['geometry', 'FLOOD_ZONE']],
how='inner',
predicate='intersects'
)
# Calculate risk scores (fully vectorised - no cursor)
def calculate_risk(row):
value_density = row['ASSESSED_VALUE'] / row.geometry.area
if row['FLOOD_ZONE'] == 'A':
return value_density * 1.5, 'HIGH'
elif row['FLOOD_ZONE'] == 'X':
return value_density * 0.8, 'MODERATE'
else:
return 0, 'LOW'
# Vectorised apply
at_risk[['RISK_SCORE', 'RISK_CATEGORY']] = at_risk.apply(
calculate_risk,
axis=1,
result_type='expand'
)
# Export to Excel (Pandas integration)
at_risk.drop(columns='geometry').to_excel(
"s3://bucket/output/flood_risk_report.xlsx",
index=False,
engine='openpyxl'
)
print("Complete!")Runtime: 38 seconds | Memory: 1.8GB peak | Same dataset, 74x faster
PERFORMANCE BREAKDOWN: WHERE TIME GOES
ArcPy (2,847 seconds)
GeoPandas (38 seconds)
Key insight: GeoParquet read is 132x faster than FGDB. Spatial join with automatic spatial indexing is 81x faster. Eliminating cursor iteration saves 587 seconds entirely.
ArcPy to GeoPandas migration isn't “better” or “worse”—it's a trade-off that makes sense for specific workflows.
If you run automated workflows on large datasets, don't need Esri-specific tools like topology rules or Network Analyst, and want to eliminate licensing costs or integrate with modern data platforms—GeoPandas delivers 10-300x performance improvements.
If you rely heavily on topology validation, network routing, or editing workflows in ArcGIS Pro—a hybrid architecture gives you GeoPandas performance for data processing while retaining ArcPy for specialised operations.
The decision is practical, not ideological. Quantify your current pain (hours wasted, licensing costs), estimate the improvement from this guide's benchmarks, and evaluate whether the migration investment makes sense.
Frequently Asked Questions
Is GeoPandas faster than ArcPy?
Yes, GeoPandas is significantly faster for most operations. Our benchmarks show 75x faster spatial joins, 75x faster buffer/dissolve operations, and 287x faster attribute calculations compared to ArcPy. The performance gains come from vectorised operations and automatic spatial indexing.
Can GeoPandas read ESRI geodatabases?
Yes, GeoPandas can read File Geodatabases (.gdb) using the OpenFileGDB driver through Fiona/GDAL. Use gpd.read_file('path/to/data.gdb', layer='layer_name') to read specific layers. For better performance, consider converting to GeoParquet format.
What are the limitations of GeoPandas compared to ArcPy?
GeoPandas lacks ESRI-specific features like topology rules and Network Analyst routing. It also loads data into memory, which can cause issues with very large datasets (50M+ features). For these cases, use a hybrid architecture that combines GeoPandas for data processing with ArcPy for specialised operations.
How do I handle large datasets in GeoPandas?
For datasets over 1M features, use Dask-GeoPandas for parallel processing and out-of-core computation. For analytical queries, DuckDB Spatial provides excellent performance without loading data into memory. For 50M+ features, consider PostGIS or BigQuery GIS.
How long does it take to migrate from ArcPy to GeoPandas?
A typical migration follows a 4-phase approach: Audit (1-2 weeks), Pilot (1-2 weeks), Parallel Production (4-6 weeks), and Full Rollout (3-6 months). Simple scripts can be migrated in hours, while complex workflows with many dependencies take longer. The key is starting with high-ROI scripts that don't rely on ESRI-specific tools.
Get Workflow Automation Insights
Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.
