Technical Deep-Dive

From ArcPy to GeoPandas: A Technical Migration Guide

How to migrate ArcPy scripts to GeoPandas—including when NOT to switch, memory management for massive datasets, and the hybrid architecture we use when GeoPandas isn't enough.

PUBLISHEDJAN 2025
READ TIME18 MIN
CATEGORYTECHNICAL
AUTHORAXIS SPATIAL
Python code editor showing ArcPy and GeoPandas side by side

You have 47 ArcPy scripts processing flood risk assessments every week. They take 3-4 hours to run. Your team knows arcpy.da.UpdateCursor by heart. The licensing costs GBP15,000 annually for Spatial Analyst extensions alone.

A colleague suggests GeoPandas. “It's open source,” they say. “Much faster.” You search Stack Overflow and find enthusiastic posts about 10x speed improvements. But you also see warnings about memory errors, missing topology tools, and incomplete translations. Before diving into the technical details, you might want to understand whether your current ArcGIS licenses are even being used efficiently.

This guide cuts through the noise. We've migrated 200+ ArcPy scripts to GeoPandas across insurance, utilities, and government clients. Here's what actually works, what doesn't, and how to decide whether migration makes sense for your organisation.

Japanese ink illustration showing transformation from tangled legacy code to clean modern implementation

The migration path: from complexity to clarity. Not a replacement—a transformation.

When NOT to Switch to GeoPandas

Let's start with honesty: GeoPandas isn't a drop-in replacement for ArcPy. If you rely on specific Esri capabilities, migration may create more problems than it solves.

ESRI TOPOLOGY RULES

Example: Utility network validation rules. “Pipes must not overlap,” “Valves must connect to exactly two pipes,” “Service areas must not have gaps.”

ArcPy's topology framework provides declarative rule definition and batch validation. GeoPandas has no equivalent. You can write custom validation logic using Shapely predicates, but it's procedural code, not declarative rules.

Verdict: If topology rules are core to your workflow, keep ArcPy for validation. Use GeoPandas for data preparation and analysis.

NETWORK ANALYST EXTENSION

Example: Routing emergency vehicles, service area analysis, vehicle routing problem with time windows.

ArcPy's Network Analyst is a sophisticated solver. Open-source alternatives exist (NetworkX, OSMnx, pgRouting), but they require different workflows and don't support all Esri network dataset features.

Verdict: Complex routing problems may justify ArcPy licensing. Simple routing can use OSMnx.

The Hybrid Architecture We Use

When clients need specific ArcPy capabilities but want GeoPandas performance for 90% of operations, we use this architecture:

1

Data Preparation: GeoPandas

Load, filter, transform, spatial joins—everything that's fast in GeoPandas.

2

Specialised Operations: ArcPy

Export to File Geodatabase, run topology validation or network routing, export results.

3

Post-Processing: GeoPandas

Load ArcPy results, join with other datasets, generate reports, export to cloud storage.

import geopandas as gpd
import arcpy

# Fast data prep in GeoPandas
parcels = gpd.read_parquet("s3://bucket/parcels.parquet")
filtered = parcels[parcels['zone'] == 'RESIDENTIAL']

# Export to FGDB for ArcPy topology check
filtered.to_file("temp.gdb", layer="parcels", driver="OpenFileGDB")

# Run Esri topology validation
arcpy.ValidateTopology_management("temp.gdb/topology")
errors = arcpy.da.SearchCursor("temp.gdb/topology_errors", ["SHAPE@", "RuleType"])

# Back to GeoPandas for reporting
error_gdf = gpd.GeoDataFrame.from_features([...])
error_gdf.to_parquet("topology_errors.parquet")

Result: 10x speed improvement on data processing, while retaining Esri-specific capabilities where needed. Licensing costs reduced by 70% (only one ArcGIS Pro licence for the validation server). For more on GeoParquet and other cloud-native formats, see our guide to COG, GeoParquet, and STAC.

How Fast Is GeoPandas Compared to ArcPy?

GeoPandas is 75x to 287x faster than ArcPy for common operations like spatial joins, buffering, and dissolves. The speed comes from vectorised operations and in-memory processing, compared to ArcPy's cursor-based iteration and file system locks. Here are three workflows we've migrated, with actual numbers.

BENCHMARK_SUITE.exe
1/3

TEST_01

SPATIAL JOIN
10,000 parcels × 250 flood zones
ArcPy (SpatialJoin_analysis)847 sec

Cursor-based iteration, File GDB locks

GeoPandas (gpd.sjoin)11.3 sec

Vectorised with spatial index

75×
faster

TEST_02

BUFFER + DISSOLVE
50,000 road segments, 50m buffer
ArcPy (Buffer → Dissolve)47 min

Sequential file operations

GeoPandas (buffer → unary_union)38 sec

Chained vectorised ops

75×
faster

TEST_03

ATTRIBUTE CALC
1M parcels, area calculation
ArcPy (da.UpdateCursor)32 min

Row-by-row cursor iteration

GeoPandas (vectorised)6.7 sec

NumPy array operations

287×
faster
Use to navigate
Performance benchmark visualization comparing ArcPy and GeoPandas execution times

How Do You Handle Large Datasets in GeoPandas?

Use chunking, Dask-GeoPandas, or DuckDB Spatial to process datasets larger than available RAM. GeoPandas loads everything into memory by default, which can crash on large datasets. The solution is to either process in spatial chunks, use Dask-GeoPandas for parallel out-of-core processing, or query directly with DuckDB Spatial. Here's the problem nobody mentions in “GeoPandas is faster” blog posts: GeoPandas loads everything into memory. ArcPy streams data through cursors. This fundamental difference means GeoPandas can fail spectacularly on datasets that ArcPy handles without complaint.

THE MEMORY WALL

Example: National parcel dataset, 50 million features, 35GB on disk. ArcPy processes this with UpdateCursor using 2GB RAM. GeoPandas tries to load the entire GeoDataFrame into memory—crashes with MemoryError on a 32GB machine.

This isn't a bug. It's architectural. Pandas (and therefore GeoPandas) is designed for in-memory analytics. When data exceeds available RAM, you need different strategies.

Memory Management Strategies

Process data in spatial or attribute-based chunks. Works for operations that don't require cross-chunk analysis (filtering, attribute calculation, projection).

import geopandas as gpd

# Process by county to keep chunks manageable
counties = gpd.read_file("counties.shp")

results = []
for idx, county in counties.iterrows():
    # Load only parcels in this county
    parcels = gpd.read_file(
        "national_parcels.gpkg",
        mask=county.geometry,  # Spatial filter
        engine="pyogrio"       # Fast driver
    )

    # Process chunk
    parcels['area_m2'] = parcels.geometry.area
    parcels['density'] = parcels['population'] / parcels['area_m2']

    results.append(parcels)

# Combine results
final = gpd.GeoDataFrame(pd.concat(results, ignore_index=True))

Dask-GeoPandas partitions data across multiple cores and can spill to disk when memory fills. Supports most GeoPandas operations with parallel execution.

import dask_geopandas as dgpd

# Read with Dask (lazy evaluation)
ddf = dgpd.read_parquet(
    "parcels.parquet",
    npartitions=32  # Split into 32 chunks
)

# Operations are lazy until compute()
ddf['area_m2'] = ddf.geometry.area
ddf['value_per_m2'] = ddf['assessed_value'] / ddf['area_m2']

# Trigger computation with parallel execution
result = ddf.compute()  # Uses all CPU cores

# Or save directly without loading full result
ddf.to_parquet("processed_parcels.parquet")
Result50M parcel dataset that crashes GeoPandas on 32GB RAM -> Dask-GeoPandas processes on the same hardware in 18 minutes using 8GB peak memory.

For pure analytical queries (no complex geometry operations), DuckDB Spatial provides SQL interface with excellent performance on large files.

import duckdb

con = duckdb.connect()
con.install_extension("spatial")
con.load_extension("spatial")

# Query 50M parcels without loading into memory
result = con.execute("""
    SELECT
        county,
        COUNT(*) as parcel_count,
        AVG(ST_Area(geometry)) as avg_area_m2,
        SUM(assessed_value) as total_value
    FROM read_parquet('parcels.parquet')
    WHERE land_use = 'RESIDENTIAL'
    GROUP BY county
""").df()

print(result)
Dataset SizeOperation TypeRecommended Approach
< 1M featuresAnyStandard GeoPandas
1-10M featuresGeometry operationsDask-GeoPandas
1-10M featuresAnalytical queriesDuckDB Spatial
10-50M featuresComplex spatialDask-GeoPandas + chunking
> 50M featuresAnyPostGIS or BigQuery GIS
Memory management strategies for large geospatial datasets

How Do You Translate ArcPy Code to GeoPandas?

Replace ArcPy cursors with GeoPandas DataFrames, geoprocessing tools with vectorised methods, and file geodatabases with GeoParquet or GeoPackage. Most ArcPy operations have direct GeoPandas equivalents: Buffer_analysis becomes .buffer(), SpatialJoin_analysis becomes gpd.sjoin(), Dissolve_management becomes .dissolve(). Here's the translation table for the 20 most common patterns.

PATTERN: READ DATA

cursor = arcpy.da.SearchCursor("parcels.shp", ["SHAPE@", "VALUE", "ZONE"])
for row in cursor:
    geometry = row[0]
    value = row[1]
    zone = row[2]

Cursor-based iteration

PATTERN: READ DATA

gdf = gpd.read_file("parcels.shp")
# Vectorised access (no loop needed)
areas = gdf.geometry.area
high_value = gdf[gdf['VALUE'] > 100000]

Vectorised operations

PATTERN: BUFFER

arcpy.Buffer_analysis("roads.shp", "roads_buffered.shp", "50 METERS")

PATTERN: BUFFER

roads = gpd.read_file("roads.shp")
roads_buffered = roads.copy()
roads_buffered['geometry'] = roads.geometry.buffer(50)
roads_buffered.to_file("roads_buffered.shp")

PATTERN: SPATIAL JOIN

arcpy.SpatialJoin_analysis(
    "parcels.shp",
    "flood_zones.shp",
    "parcels_flood_risk.shp",
    "JOIN_ONE_TO_ONE",
    "KEEP_ALL",
    match_option="INTERSECT"
)

PATTERN: SPATIAL JOIN

parcels = gpd.read_file("parcels.shp")
flood_zones = gpd.read_file("flood_zones.shp")

result = gpd.sjoin(
    parcels,
    flood_zones,
    how="left",           # KEEP_ALL
    predicate="intersects"  # INTERSECT
)

result.to_file("parcels_flood_risk.shp")

PATTERN: DISSOLVE

arcpy.Dissolve_management(
    "parcels.shp",
    "parcels_by_zone.shp",
    "ZONE",
    [["VALUE", "SUM"], ["AREA", "SUM"]]
)

PATTERN: DISSOLVE

parcels = gpd.read_file("parcels.shp")

dissolved = parcels.dissolve(
    by='ZONE',
    aggfunc={'VALUE': 'sum', 'AREA': 'sum'}
)

dissolved.to_file("parcels_by_zone.shp")
OperationArcPyGeoPandas
Clip to boundaryClip_analysisgpd.clip(gdf, mask)
ReprojectProject_managementgdf.to_crs(epsg=4326)
Select by attributeSelect_analysisgdf[gdf['field'] > 10]
Calculate areaCalculateField + SHAPE@AREAgdf.geometry.area
CentroidsFeatureToPointgdf.geometry.centroid
IntersectionIntersect_analysisgpd.overlay(gdf1, gdf2, 'intersection')
UnionUnion_analysisgpd.overlay(gdf1, gdf2, 'union')
Merge datasetsMerge_managementpd.concat([gdf1, gdf2])
Code translation patterns from ArcPy to GeoPandas

What Replaces ArcPy Spatial Analyst for Raster Processing?

Rasterio handles raster I/O, while NumPy and xarray perform raster algebra. This combination replaces arcpy.sa (Spatial Analyst). It's more explicit than Spatial Analyst's Map Algebra, but offers better performance, native cloud integration (COG, S3), and works without ArcGIS licensing. For raster operations, Rasterio is the GeoPandas equivalent. It's lower-level than Spatial Analyst (more explicit NumPy operations), but offers better performance and cloud integration.

PATTERN: READ RASTER, APPLY CALCULATION

from arcpy.sa import *

dem = Raster("dem.tif")
slope = Slope(dem, "DEGREE")
slope.save("slope.tif")

RASTERIO + NUMPY

import rasterio
import numpy as np
from rasterio.transform import Affine

with rasterio.open("dem.tif") as src:
    dem = src.read(1)
    transform = src.transform
    # Calculate slope using gradient
    dy, dx = np.gradient(dem, transform[0])
    slope = np.degrees(np.arctan(np.sqrt(dx**2 + dy**2)))
    # Write output
    profile = src.profile
    with rasterio.open("slope.tif", "w", **profile) as dst:
        dst.write(slope, 1)

PATTERN: EXTRACT RASTER VALUES TO POINTS

from arcpy.sa import ExtractValuesToPoints

ExtractValuesToPoints(
    "points.shp",
    "elevation.tif",
    "points_with_elev.shp"
)

RASTERIO + GEOPANDAS

import geopandas as gpd
import rasterio

points = gpd.read_file("points.shp")
with rasterio.open("elevation.tif") as src:
    coords = [(p.x, p.y) for p in points.geometry]
    points['elevation'] = [v[0] for v in src.sample(coords)]
points.to_file("points_with_elev.shp")

CLOUD-NATIVE RASTERS: COG

Rasterio reads Cloud-Optimized GeoTIFFs (COG) directly from S3/Azure without downloading the full file. ArcPy requires local file access or slow streaming.

with rasterio.open("s3://bucket/elevation.tif") as src:
    # Read only a 1km² window (fast range request)
    window = src.window(xmin, ymin, xmax, ymax)
    data = src.read(1, window=window)

Performance: Reading 1km squared from 50GB raster: ArcPy 14 minutes (download full file) to Rasterio 2.3 seconds (range request).

How Long Does It Take to Migrate from ArcPy to GeoPandas?

A typical migration takes 3-6 months for a full codebase, starting with a 2-week pilot. The timeline depends on script complexity, Esri-specific dependencies, and team experience with Python. Simple scripts (data loading, filtering, exports) migrate in hours. Complex workflows with topology or network analysis may need hybrid architectures. We've migrated 200+ ArcPy scripts. Here's the systematic approach that works, with specific actions at each phase. If your team needs to build Python skills first, see our guide to training GIS teams for workflow automation.

  • Inventory all ArcPy scripts: list file paths, what they do, how often they run
  • Measure current performance: run time, memory usage, failure rate
  • Identify dependencies: which scripts use Network Analyst, Topology, or other Esri-specific tools?
  • Calculate licensing costs: ArcGIS Pro licences, Spatial Analyst, Network Analyst extensions
  • Prioritise by ROI: migrate slow, frequently-run scripts without Esri dependencies first
  • Select pilot: moderate complexity, no Esri-specific dependencies, measurable performance
  • Translate using patterns above: test each operation with production data
  • Benchmark rigorously: run both versions 5 times, measure mean/std deviation
  • Validate outputs: geometry checks (ST_Equals), attribute comparison, visual inspection
  • Document translation: which ArcPy functions map to which GeoPandas patterns
  • Run both ArcPy and GeoPandas versions in production simultaneously
  • Alert on output divergence: automated geometry and attribute comparison
  • Monitor memory usage: identify scripts that need Dask or chunking
  • Train team: pair programming sessions, code review, documentation
  • Build internal library: reusable functions for common operations
  • Migrate remaining scripts systematically (priority order from audit)
  • Implement hybrid architecture for Esri-dependent workflows
  • Create monitoring dashboard: script run times, success rates, memory usage
  • Decommission ArcPy versions after 30-day confidence period
  • Reduce licensing: cancel unused Esri licences, document savings

Complete Workflow Translation: Real Example

Here's a production workflow we migrated for a global reinsurer: identify parcels in flood zones, calculate risk scores, export for underwriting review.

ORIGINAL ARCPY VERSION (47 MINUTES)

import arcpy
import os

# Setup
arcpy.env.workspace = "C:/data/flood_risk.gdb"
arcpy.env.overwriteOutput = True

# Read parcels and flood zones
parcels = "parcels"
flood_zones = "FEMA_flood_zones"

# Buffer flood zones by 50m for transition zone
print("Buffering flood zones...")
arcpy.Buffer_analysis(flood_zones, "flood_buffered", "50 METERS")

# Spatial join to find at-risk parcels
print("Identifying at-risk parcels...")
arcpy.SpatialJoin_analysis(
    parcels,
    "flood_buffered",
    "parcels_at_risk",
    "JOIN_ONE_TO_ONE",
    "KEEP_ALL",
    match_option="INTERSECT"
)

# Calculate risk score
print("Calculating risk scores...")
arcpy.AddField_management("parcels_at_risk", "RISK_SCORE", "DOUBLE")
arcpy.AddField_management("parcels_at_risk", "RISK_CATEGORY", "TEXT")

cursor = arcpy.da.UpdateCursor(
    "parcels_at_risk",
    ["SHAPE@AREA", "ASSESSED_VALUE", "FLOOD_ZONE", "RISK_SCORE", "RISK_CATEGORY"]
)

for row in cursor:
    area = row[0]
    value = row[1]
    zone = row[2]

    # Risk calculation
    if zone == "A":  # High risk
        risk = (value / area) * 1.5
        category = "HIGH"
    elif zone == "X":  # Moderate
        risk = (value / area) * 0.8
        category = "MODERATE"
    else:
        risk = 0
        category = "LOW"

    row[3] = risk
    row[4] = category
    cursor.updateRow(row)

del cursor

# Export to Excel for underwriting
print("Exporting results...")
arcpy.conversion.TableToExcel("parcels_at_risk", "C:/output/flood_risk_report.xlsx")

print("Complete!")

Runtime: 47 minutes | Memory: 2.1GB peak | 10,000 parcels, 250 flood zones

MIGRATED GEOPANDAS VERSION (38 SECONDS)

import geopandas as gpd
import pandas as pd

# Read data (GeoParquet is 10x faster than FGDB)
parcels = gpd.read_parquet("s3://bucket/parcels.parquet")
flood_zones = gpd.read_parquet("s3://bucket/flood_zones.parquet")

# Buffer flood zones (vectorised operation)
flood_buffered = flood_zones.copy()
flood_buffered['geometry'] = flood_zones.geometry.buffer(50)

# Spatial join (uses spatial index automatically)
at_risk = gpd.sjoin(
    parcels,
    flood_buffered[['geometry', 'FLOOD_ZONE']],
    how='inner',
    predicate='intersects'
)

# Calculate risk scores (fully vectorised - no cursor)
def calculate_risk(row):
    value_density = row['ASSESSED_VALUE'] / row.geometry.area

    if row['FLOOD_ZONE'] == 'A':
        return value_density * 1.5, 'HIGH'
    elif row['FLOOD_ZONE'] == 'X':
        return value_density * 0.8, 'MODERATE'
    else:
        return 0, 'LOW'

# Vectorised apply
at_risk[['RISK_SCORE', 'RISK_CATEGORY']] = at_risk.apply(
    calculate_risk,
    axis=1,
    result_type='expand'
)

# Export to Excel (Pandas integration)
at_risk.drop(columns='geometry').to_excel(
    "s3://bucket/output/flood_risk_report.xlsx",
    index=False,
    engine='openpyxl'
)

print("Complete!")

Runtime: 38 seconds | Memory: 1.8GB peak | Same dataset, 74x faster

PERFORMANCE BREAKDOWN: WHERE TIME GOES

ArcPy (2,847 seconds)

Read FGDB423s (15%)
Buffer operation847s (30%)
Spatial join912s (32%)
Cursor iteration587s (21%)
Export to Excel78s (3%)

GeoPandas (38 seconds)

Read Parquet3.2s (8%)
Buffer operation8.7s (23%)
Spatial join11.3s (30%)
Vectorised calculation6.7s (18%)
Export to Excel8.1s (21%)

Key insight: GeoParquet read is 132x faster than FGDB. Spatial join with automatic spatial indexing is 81x faster. Eliminating cursor iteration saves 587 seconds entirely.

ArcPy to GeoPandas migration isn't “better” or “worse”—it's a trade-off that makes sense for specific workflows.

If you run automated workflows on large datasets, don't need Esri-specific tools like topology rules or Network Analyst, and want to eliminate licensing costs or integrate with modern data platforms—GeoPandas delivers 10-300x performance improvements.

If you rely heavily on topology validation, network routing, or editing workflows in ArcGIS Pro—a hybrid architecture gives you GeoPandas performance for data processing while retaining ArcPy for specialised operations.

The decision is practical, not ideological. Quantify your current pain (hours wasted, licensing costs), estimate the improvement from this guide's benchmarks, and evaluate whether the migration investment makes sense.

Frequently Asked Questions

Is GeoPandas faster than ArcPy?

Yes, GeoPandas is significantly faster for most operations. Our benchmarks show 75x faster spatial joins, 75x faster buffer/dissolve operations, and 287x faster attribute calculations compared to ArcPy. The performance gains come from vectorised operations and automatic spatial indexing.

Can GeoPandas read ESRI geodatabases?

Yes, GeoPandas can read File Geodatabases (.gdb) using the OpenFileGDB driver through Fiona/GDAL. Use gpd.read_file('path/to/data.gdb', layer='layer_name') to read specific layers. For better performance, consider converting to GeoParquet format.

What are the limitations of GeoPandas compared to ArcPy?

GeoPandas lacks ESRI-specific features like topology rules and Network Analyst routing. It also loads data into memory, which can cause issues with very large datasets (50M+ features). For these cases, use a hybrid architecture that combines GeoPandas for data processing with ArcPy for specialised operations.

How do I handle large datasets in GeoPandas?

For datasets over 1M features, use Dask-GeoPandas for parallel processing and out-of-core computation. For analytical queries, DuckDB Spatial provides excellent performance without loading data into memory. For 50M+ features, consider PostGIS or BigQuery GIS.

How long does it take to migrate from ArcPy to GeoPandas?

A typical migration follows a 4-phase approach: Audit (1-2 weeks), Pilot (1-2 weeks), Parallel Production (4-6 weeks), and Full Rollout (3-6 months). Simple scripts can be migrated in hours, while complex workflows with many dependencies take longer. The key is starting with high-ROI scripts that don't rely on ESRI-specific tools.

Get Workflow Automation Insights

Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.

NEXT STEP

Need Help Migrating Your ArcPy Scripts?

Our free workflow assessment analyses your current scripts, identifies migration candidates, and provides expected performance improvements and ROI estimates.