Technical Deep-Dive

ArcPy to Rasterio: Complete Migration Guide for Raster Processing

How to migrate ArcPy Spatial Analyst scripts to Rasterio—including NumPy raster algebra, Cloud-Optimized GeoTIFF output, and memory management for massive rasters.

PUBLISHEDJAN 2025
READ TIME18 MIN
CATEGORYTECHNICAL
AUTHORAXIS SPATIAL
Python code editor showing raster processing with Rasterio and NumPy

Your team processes 500GB of elevation data weekly. ArcPy Spatial Analyst scripts calculate slope, aspect, and hillshade for terrain visualisation. The processing takes 6 hours per run, and the Spatial Analyst license costs GBP 2,500 annually per seat.

Then someone discovers Rasterio. “It reads GeoTIFFs directly to NumPy arrays,” they say. “No license fees.” You search Stack Overflow and find posts about 10x speed improvements and cloud-native workflows. But you also see complaints about missing functions, coordinate system confusion, and cryptic GDAL errors.

This guide cuts through the noise. We've migrated raster processing workflows for forestry, mining, and environmental monitoring clients. Here's what actually works, the translation patterns that matter, and how to handle rasters that exceed available memory. If you've already migrated vector workflows, see our ArcPy to GeoPandas guide.

Reading Rasters: ArcPy vs Rasterio

Rasterio reads rasters directly into NumPy arrays, enabling vectorised processing that's 5-20x faster than arcpy.sa Map Algebra. The key difference: ArcPy wraps rasters in proprietary objects, while Rasterio exposes raw NumPy arrays for standard scientific Python operations.

PATTERN: READ RASTER

import arcpy
from arcpy.sa import *

# Set up environment
arcpy.env.workspace = "C:/data"
arcpy.CheckOutExtension("Spatial")

# Read raster - returns Raster object
dem = Raster("elevation.tif")

# Access properties
print(f"Cell size: {dem.meanCellWidth}")
print(f"Extent: {dem.extent}")
print(f"Spatial ref: {dem.spatialReference.name}")

Returns proprietary Raster object

PATTERN: READ RASTER

import rasterio
import numpy as np

# Open raster - context manager handles cleanup
with rasterio.open("elevation.tif") as src:
    # Read as NumPy array (band 1)
    dem = src.read(1)

    # Access properties via profile
    print(f"Cell size: {src.res}")
    print(f"Bounds: {src.bounds}")
    print(f"CRS: {src.crs}")

    # dem is now a standard NumPy array
    print(f"Shape: {dem.shape}")
    print(f"Dtype: {dem.dtype}")

Returns NumPy array + metadata

MULTI-BAND RASTERS

Rasterio handles multi-band imagery (like Landsat, Sentinel) more intuitively than ArcPy. Each band becomes a slice in a 3D NumPy array.

# Read all bands at once
with rasterio.open("sentinel2.tif") as src:
    bands = src.read()  # Shape: (bands, height, width)
    red = bands[3]      # Band 4 (0-indexed)
    nir = bands[7]      # Band 8

    # Calculate NDVI using NumPy broadcasting
    ndvi = (nir.astype(float) - red) / (nir + red + 1e-10)

Raster Algebra: arcpy.sa vs NumPy Operations

NumPy replaces arcpy.sa Map Algebra with standard Python operators. Instead of Con(), Slope(), and Reclassify(), you use NumPy functions that process entire arrays at once. This vectorised approach is typically 5-20x faster than ArcPy's Map Algebra.

PATTERN: CALCULATE SLOPE

from arcpy.sa import Slope

# Calculate slope in degrees
dem = Raster("elevation.tif")
slope = Slope(dem, "DEGREE")
slope.save("slope.tif")

NUMPY + RASTERIO

import rasterio
import numpy as np

with rasterio.open("elevation.tif") as src:
    dem = src.read(1).astype(float)
    cellsize = src.res[0]  # Assumes square cells

    # Calculate gradient using NumPy
    dy, dx = np.gradient(dem, cellsize)
    slope_rad = np.arctan(np.sqrt(dx**2 + dy**2))
    slope_deg = np.degrees(slope_rad)

    # Write output
    profile = src.profile.copy()
    profile.update(dtype=rasterio.float32)
    with rasterio.open("slope.tif", "w", **profile) as dst:
        dst.write(slope_deg.astype(np.float32), 1)

PATTERN: CONDITIONAL RECLASSIFY

from arcpy.sa import Con, Reclassify, RemapRange

# Conditional: Set values < 0 to NoData
dem = Raster("elevation.tif")
dem_positive = Con(dem >= 0, dem)

# Reclassify elevation to categories
reclass = Reclassify(dem, "VALUE",
    RemapRange([[0, 100, 1],
                [100, 500, 2],
                [500, 1000, 3],
                [1000, 9999, 4]])
)
reclass.save("elevation_classes.tif")

NUMPY + RASTERIO

import rasterio
import numpy as np

with rasterio.open("elevation.tif") as src:
    dem = src.read(1).astype(float)
    nodata = src.nodata

    # Conditional: Set negative values to NoData
    dem[dem < 0] = np.nan

    # Reclassify using np.digitize or np.select
    conditions = [
        (dem >= 0) & (dem < 100),
        (dem >= 100) & (dem < 500),
        (dem >= 500) & (dem < 1000),
        dem >= 1000
    ]
    classes = np.select(conditions, [1, 2, 3, 4], default=0)

    # Write output
    profile = src.profile.copy()
    profile.update(dtype=rasterio.uint8, nodata=0)
    with rasterio.open("elevation_classes.tif", "w", **profile) as dst:
        dst.write(classes.astype(np.uint8), 1)

Libraries for Common Terrain Operations

While you can implement slope, aspect, and hillshade with raw NumPy, several libraries provide optimised implementations:

Operationarcpy.saOpen Source Alternative
SlopeSlope()richdem.TerrainAttribute() or np.gradient()
AspectAspect()richdem or xrspatial.aspect()
HillshadeHillshade()xrspatial.hillshade() or earthpy
Flow DirectionFlowDirection()pysheds or whitebox
WatershedWatershed()pysheds.catchment()
Zonal StatisticsZonalStatistics()rasterstats.zonal_stats()

Writing to Cloud-Native Formats (COG)

Cloud-Optimized GeoTIFF (COG) is the key advantage of Rasterio over ArcPy. COG enables streaming large rasters from cloud storage (S3, Azure Blob, GCS) without downloading entire files. ArcPy has no native COG support—you must download files locally first.

WHAT MAKES COG SPECIAL

  • HTTP Range Requests: Read only the pixels you need, not the entire file
  • Internal Tiling: Optimised for spatial queries (256x256 or 512x512 tiles)
  • Overviews: Pre-computed pyramids for fast visualisation at any zoom
  • Cloud Storage: Works directly with S3, Azure, GCS—no local download

STANDARD OUTPUT

import rasterio

# Standard GeoTIFF write
with rasterio.open(
    "output.tif", "w",
    driver="GTiff",
    height=data.shape[0],
    width=data.shape[1],
    count=1,
    dtype=data.dtype,
    crs=src.crs,
    transform=src.transform
) as dst:
    dst.write(data, 1)

# Problem: No internal tiling
# Problem: No overviews
# Problem: Must download entire file to read

CLOUD-OPTIMIZED OUTPUT

import rasterio
from rasterio.enums import Resampling

# Cloud-Optimized GeoTIFF with COG driver
with rasterio.open(
    "output_cog.tif", "w",
    driver="COG",
    height=data.shape[0],
    width=data.shape[1],
    count=1,
    dtype=data.dtype,
    crs=src.crs,
    transform=src.transform,
    compress="deflate",        # Compression
    blocksize=512,             # Internal tile size
    overview_resampling="average"
) as dst:
    dst.write(data, 1)

# Or use rio-cogeo CLI:
# rio cogeo create input.tif output_cog.tif

Reading COGs from Cloud Storage

The real power: read only the portion of a massive raster you need, directly from S3 or Azure Blob Storage.

import rasterio
from rasterio.windows import from_bounds

# Read small window from 50GB COG on S3 (no full download!)
cog_url = "s3://bucket/large-raster.tif"

with rasterio.open(cog_url) as src:
    # Define area of interest
    window = from_bounds(
        left=-122.5, bottom=37.5,
        right=-122.0, top=38.0,
        transform=src.transform
    )

    # Read only that window - range request!
    subset = src.read(1, window=window)
    print(f"Read {subset.nbytes / 1e6:.1f} MB instead of full file")

Performance: Reading 10km area from 50GB global DEM: ArcPy downloads entire file (14 minutes). Rasterio COG range request: 2.3 seconds.

Memory Management for Large Rasters

Use windowed reading, Dask arrays, or xarray for rasters that exceed available RAM. Unlike ArcPy, which streams data through file-based operations, Rasterio loads data into memory by default. For 10GB+ rasters, you need different strategies.

THE MEMORY WALL

Example: Global 30m DEM (SRTM), 432,000 x 180,000 pixels, 150GB uncompressed. Calling src.read() tries to load everything—crashes with MemoryError.

ArcPy handles this transparently (using disk-based processing), but loses parallelisation benefits. Rasterio gives you control—you choose the trade-off.

Memory Management Strategies

Process rasters in spatial chunks (windows). Best for operations that don't need cross-tile context (reclassify, NDVI, simple algebra).

import rasterio
from rasterio.windows import Window
import numpy as np

with rasterio.open("huge_raster.tif") as src:
    # Process in 1024x1024 windows
    profile = src.profile.copy()
    profile.update(compress='deflate')

    with rasterio.open("output.tif", "w", **profile) as dst:
        for ji, window in src.block_windows(1):
            # Read chunk
            data = src.read(1, window=window)

            # Process chunk (example: normalize)
            processed = (data - data.min()) / (data.max() - data.min())

            # Write chunk
            dst.write(processed.astype(np.float32), 1, window=window)

Dask arrays enable lazy evaluation and parallel processing across chunks. Operations are only computed when you explicitly trigger them.

import dask.array as da
import rioxarray

# Open with rioxarray - creates lazy Dask-backed array
ds = rioxarray.open_rasterio("huge_raster.tif", chunks=(1, 2048, 2048))

# Operations are lazy - no computation yet
dem = ds.squeeze()
slope = np.arctan(np.sqrt(
    np.gradient(dem, axis=0)**2 + np.gradient(dem, axis=1)**2
))

# Trigger computation with parallel execution
result = slope.compute()  # Uses all CPU cores

# Or write directly (streams to disk)
slope.rio.to_raster("slope_output.tif", driver="COG")
Result150GB raster processed in 8 minutes on 8-core machine using 4GB peak memory (vs. 2+ hours with ArcPy).

For time-series or multi-dimensional raster stacks (satellite imagery), xarray provides powerful abstractions for labeled arrays with Dask parallelisation.

import xarray as xr
import rioxarray

# Open time series of rasters with Dask chunking
files = ["landsat_2020_01.tif", "landsat_2020_02.tif", ...]
ds = xr.open_mfdataset(
    files,
    engine="rasterio",
    chunks={"x": 2048, "y": 2048}
)

# Calculate temporal statistics (lazy)
mean_ndvi = ds['ndvi'].mean(dim='time')
max_ndvi = ds['ndvi'].max(dim='time')

# Write results
mean_ndvi.rio.to_raster("mean_ndvi.tif")
Raster SizeOperation TypeRecommended Approach
< 2GBAnyStandard Rasterio (load fully)
2-20GBPer-pixel (no context)Windowed reading
2-20GBNeighbourhood opsDask with overlap
20-100GBComplex analysisrioxarray + Dask cluster
> 100GBAnyGoogle Earth Engine or STAC + COG

Writing Rasters: Best Practices

Always copy the source profile, update only what changes, and prefer COG format for outputs. The most common bugs come from mismatched metadata—wrong CRS, transform, or nodata value.

COMMON MISTAKES

# DON'T: Hardcode metadata
with rasterio.open(
    "output.tif", "w",
    driver="GTiff",
    height=1000,
    width=1000,
    count=1,
    dtype="float32",
    crs="EPSG:4326",  # Might not match input!
    transform=Affine(0.1, 0, 0, 0, -0.1, 0)  # Wrong!
) as dst:
    dst.write(data, 1)

# Problems:
# - CRS might not match input
# - Transform is completely wrong
# - No compression (huge files)
# - nodata not set

CORRECT APPROACH

# DO: Copy and update profile
with rasterio.open("input.tif") as src:
    data = src.read(1)
    profile = src.profile.copy()

    # Process data
    result = some_processing(data)

    # Update only what changed
    profile.update(
        driver="COG",           # Cloud-optimized
        dtype=result.dtype,     # Match output type
        compress="deflate",     # Compression
        nodata=-9999           # Explicit nodata
    )

    with rasterio.open("output.tif", "w", **profile) as dst:
        dst.write(result, 1)

GDAL CONFIGURATION FOR CLOUD WRITES

Writing large COGs to cloud storage requires GDAL configuration. Set these environment variables before opening files:

import os
import rasterio

# Configure GDAL for S3
os.environ['AWS_REQUEST_PAYER'] = 'requester'  # For requester-pays buckets
os.environ['GDAL_DISABLE_READDIR_ON_OPEN'] = 'YES'  # Faster opens
os.environ['CPL_VSIL_CURL_ALLOWED_EXTENSIONS'] = '.tif,.tiff,.vrt'

# Now write directly to S3
with rasterio.open("s3://my-bucket/output.tif", "w", **profile) as dst:
    dst.write(data, 1)

Migration Timeline: What to Expect

A typical raster workflow migration takes 2-4 months, starting with a 1-week proof of concept. The timeline depends on workflow complexity, team familiarity with NumPy, and cloud integration requirements.

  • Inventory all arcpy.sa scripts: list functions used, input/output formats
  • Identify priority workflows: high runtime, frequent execution
  • Build proof of concept: migrate one script end-to-end
  • Benchmark: compare runtime, output accuracy, memory usage
  • Convert input rasters to COG format (gdal_translate or rio-cogeo)
  • Set up cloud storage structure (S3 buckets, folder conventions)
  • Create validation scripts: compare outputs pixel-by-pixel
  • Document CRS and nodata conventions for team
  • Migrate scripts in priority order (highest ROI first)
  • Implement memory management for large rasters
  • Build reusable library: common operations, error handling
  • Run parallel validation: ArcPy and Rasterio produce identical outputs
  • Deploy migrated workflows to production
  • Monitor performance: runtime, memory, cloud costs
  • Train team on Rasterio patterns and debugging
  • Decommission ArcPy workflows after 30-day confidence period

Complete Workflow Translation: Real Example

Here's a production workflow we migrated for an environmental monitoring client: calculate terrain derivatives from a 20GB DEM and identify areas exceeding slope thresholds.

ORIGINAL ARCPY VERSION (4.2 HOURS)

import arcpy
from arcpy.sa import *

arcpy.env.workspace = "C:/data"
arcpy.CheckOutExtension("Spatial")

# Read 20GB DEM
print("Reading DEM...")
dem = Raster("large_dem.tif")

# Calculate slope
print("Calculating slope...")
slope = Slope(dem, "DEGREE")

# Calculate aspect
print("Calculating aspect...")
aspect = Aspect(dem)

# Identify steep slopes (> 30 degrees)
print("Finding steep areas...")
steep_mask = Con(slope > 30, 1, 0)

# Calculate hillshade for visualisation
print("Creating hillshade...")
hillshade = Hillshade(dem, 315, 45)

# Save outputs
print("Saving results...")
slope.save("slope.tif")
aspect.save("aspect.tif")
steep_mask.save("steep_areas.tif")
hillshade.save("hillshade.tif")

print("Complete!")

Runtime: 4.2 hours | Peak memory: 8GB | 20GB input DEM

MIGRATED RASTERIO VERSION (12 MINUTES)

import rioxarray as rxr
import xrspatial
import numpy as np

# Open with Dask chunking (lazy loading)
print("Opening DEM with lazy loading...")
dem = rxr.open_rasterio(
    "s3://bucket/large_dem.tif",
    chunks={"x": 2048, "y": 2048}
).squeeze()

# Calculate terrain derivatives (all lazy)
print("Setting up terrain calculations...")
slope = xrspatial.slope(dem)
aspect = xrspatial.aspect(dem)
hillshade = xrspatial.hillshade(dem, azimuth=315, angle_altitude=45)

# Identify steep areas
steep_mask = xr.where(slope > 30, 1, 0).astype(np.uint8)

# Write all outputs (parallel computation + streaming)
print("Computing and writing results...")
slope.rio.to_raster("s3://bucket/outputs/slope.tif", driver="COG")
aspect.rio.to_raster("s3://bucket/outputs/aspect.tif", driver="COG")
steep_mask.rio.to_raster("s3://bucket/outputs/steep_areas.tif", driver="COG")
hillshade.rio.to_raster("s3://bucket/outputs/hillshade.tif", driver="COG")

print("Complete!")

Runtime: 12 minutes | Peak memory: 3GB | Same 20GB input, 21x faster

PERFORMANCE BREAKDOWN: WHERE TIME GOES

ArcPy (4.2 hours)

Read from disk45 min (18%)
Slope calculation68 min (27%)
Aspect calculation52 min (21%)
Hillshade41 min (16%)
Write outputs46 min (18%)

Rasterio + xrspatial (12 min)

Lazy open (no read)0.3 sec (0%)
Parallel computation8 min (67%)
Stream to COG4 min (33%)

Key insight: Lazy loading eliminates upfront read time. Parallel Dask execution uses all CPU cores. COG output streams directly to S3 without temp files.

Rasterio migration makes sense when you need cloud integration, parallel processing, or want to eliminate Spatial Analyst licensing costs.

The 5-20x performance improvement comes from vectorised NumPy operations and parallel Dask execution. The workflow improvement comes from COG format enabling cloud-native architectures.

However, if your workflows depend on ArcPy-specific functions like sophisticated hydrological modelling or you're locked into Esri infrastructure, a hybrid approach may work better—use Rasterio for data preparation and output, ArcPy for specialised analysis.

Frequently Asked Questions

Is Rasterio faster than ArcPy Spatial Analyst?

Yes, for most operations. Rasterio with NumPy provides 5-20x speedups for raster algebra due to vectorised operations. Reading Cloud-Optimized GeoTIFFs can be 10-50x faster than ArcPy because of HTTP range requests—you only download the pixels you need.

Can Rasterio read ArcGIS raster formats?

Yes, Rasterio reads all GDAL-supported formats including GeoTIFF, JPEG2000, MrSID, ESRI Grid, and File Geodatabase rasters. For best performance, convert to Cloud-Optimized GeoTIFF format which enables streaming from cloud storage without downloading entire files.

What replaces arcpy.sa Slope and Aspect functions?

Use np.gradient() for basic slope/aspect calculations, or libraries like xrspatial, richdem, or earthpy for pre-built terrain analysis functions. Performance is typically 5-10x faster than arcpy.sa equivalents due to vectorised computation.

How do I handle large rasters that don't fit in memory?

Use Rasterio's windowed reading to process in chunks, or xarray with Dask for automatic chunking and parallel execution. For very large datasets (100GB+), COG format enables reading only the spatial extent you need via HTTP range requests.

What is Cloud-Optimized GeoTIFF (COG) and why should I use it?

COG is a GeoTIFF variant optimised for cloud storage and HTTP range requests. It enables reading only portions of large rasters without downloading entire files. Rasterio natively supports COG for reading from and writing to cloud storage like S3 and Azure Blob—ArcPy has no equivalent capability.

Get Workflow Automation Insights

Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.

NEXT STEP

Need Help Migrating Your Raster Workflows?

Our free workflow assessment analyses your current ArcPy scripts, identifies migration candidates, and provides expected performance improvements and ROI estimates.