Your team processes 500GB of elevation data weekly. ArcPy Spatial Analyst scripts calculate slope, aspect, and hillshade for terrain visualisation. The processing takes 6 hours per run, and the Spatial Analyst license costs GBP 2,500 annually per seat.
Then someone discovers Rasterio. “It reads GeoTIFFs directly to NumPy arrays,” they say. “No license fees.” You search Stack Overflow and find posts about 10x speed improvements and cloud-native workflows. But you also see complaints about missing functions, coordinate system confusion, and cryptic GDAL errors.
This guide cuts through the noise. We've migrated raster processing workflows for forestry, mining, and environmental monitoring clients. Here's what actually works, the translation patterns that matter, and how to handle rasters that exceed available memory. If you've already migrated vector workflows, see our ArcPy to GeoPandas guide.
Reading Rasters: ArcPy vs Rasterio
Rasterio reads rasters directly into NumPy arrays, enabling vectorised processing that's 5-20x faster than arcpy.sa Map Algebra. The key difference: ArcPy wraps rasters in proprietary objects, while Rasterio exposes raw NumPy arrays for standard scientific Python operations.
PATTERN: READ RASTER
import arcpy
from arcpy.sa import *
# Set up environment
arcpy.env.workspace = "C:/data"
arcpy.CheckOutExtension("Spatial")
# Read raster - returns Raster object
dem = Raster("elevation.tif")
# Access properties
print(f"Cell size: {dem.meanCellWidth}")
print(f"Extent: {dem.extent}")
print(f"Spatial ref: {dem.spatialReference.name}")Returns proprietary Raster object
PATTERN: READ RASTER
import rasterio
import numpy as np
# Open raster - context manager handles cleanup
with rasterio.open("elevation.tif") as src:
# Read as NumPy array (band 1)
dem = src.read(1)
# Access properties via profile
print(f"Cell size: {src.res}")
print(f"Bounds: {src.bounds}")
print(f"CRS: {src.crs}")
# dem is now a standard NumPy array
print(f"Shape: {dem.shape}")
print(f"Dtype: {dem.dtype}")Returns NumPy array + metadata
MULTI-BAND RASTERS
Rasterio handles multi-band imagery (like Landsat, Sentinel) more intuitively than ArcPy. Each band becomes a slice in a 3D NumPy array.
# Read all bands at once
with rasterio.open("sentinel2.tif") as src:
bands = src.read() # Shape: (bands, height, width)
red = bands[3] # Band 4 (0-indexed)
nir = bands[7] # Band 8
# Calculate NDVI using NumPy broadcasting
ndvi = (nir.astype(float) - red) / (nir + red + 1e-10)Raster Algebra: arcpy.sa vs NumPy Operations
NumPy replaces arcpy.sa Map Algebra with standard Python operators. Instead of Con(), Slope(), and Reclassify(), you use NumPy functions that process entire arrays at once. This vectorised approach is typically 5-20x faster than ArcPy's Map Algebra.
PATTERN: CALCULATE SLOPE
from arcpy.sa import Slope
# Calculate slope in degrees
dem = Raster("elevation.tif")
slope = Slope(dem, "DEGREE")
slope.save("slope.tif")NUMPY + RASTERIO
import rasterio
import numpy as np
with rasterio.open("elevation.tif") as src:
dem = src.read(1).astype(float)
cellsize = src.res[0] # Assumes square cells
# Calculate gradient using NumPy
dy, dx = np.gradient(dem, cellsize)
slope_rad = np.arctan(np.sqrt(dx**2 + dy**2))
slope_deg = np.degrees(slope_rad)
# Write output
profile = src.profile.copy()
profile.update(dtype=rasterio.float32)
with rasterio.open("slope.tif", "w", **profile) as dst:
dst.write(slope_deg.astype(np.float32), 1)PATTERN: CONDITIONAL RECLASSIFY
from arcpy.sa import Con, Reclassify, RemapRange
# Conditional: Set values < 0 to NoData
dem = Raster("elevation.tif")
dem_positive = Con(dem >= 0, dem)
# Reclassify elevation to categories
reclass = Reclassify(dem, "VALUE",
RemapRange([[0, 100, 1],
[100, 500, 2],
[500, 1000, 3],
[1000, 9999, 4]])
)
reclass.save("elevation_classes.tif")NUMPY + RASTERIO
import rasterio
import numpy as np
with rasterio.open("elevation.tif") as src:
dem = src.read(1).astype(float)
nodata = src.nodata
# Conditional: Set negative values to NoData
dem[dem < 0] = np.nan
# Reclassify using np.digitize or np.select
conditions = [
(dem >= 0) & (dem < 100),
(dem >= 100) & (dem < 500),
(dem >= 500) & (dem < 1000),
dem >= 1000
]
classes = np.select(conditions, [1, 2, 3, 4], default=0)
# Write output
profile = src.profile.copy()
profile.update(dtype=rasterio.uint8, nodata=0)
with rasterio.open("elevation_classes.tif", "w", **profile) as dst:
dst.write(classes.astype(np.uint8), 1)Libraries for Common Terrain Operations
While you can implement slope, aspect, and hillshade with raw NumPy, several libraries provide optimised implementations:
| Operation | arcpy.sa | Open Source Alternative |
|---|---|---|
| Slope | Slope() | richdem.TerrainAttribute() or np.gradient() |
| Aspect | Aspect() | richdem or xrspatial.aspect() |
| Hillshade | Hillshade() | xrspatial.hillshade() or earthpy |
| Flow Direction | FlowDirection() | pysheds or whitebox |
| Watershed | Watershed() | pysheds.catchment() |
| Zonal Statistics | ZonalStatistics() | rasterstats.zonal_stats() |
Writing to Cloud-Native Formats (COG)
Cloud-Optimized GeoTIFF (COG) is the key advantage of Rasterio over ArcPy. COG enables streaming large rasters from cloud storage (S3, Azure Blob, GCS) without downloading entire files. ArcPy has no native COG support—you must download files locally first.
WHAT MAKES COG SPECIAL
- HTTP Range Requests: Read only the pixels you need, not the entire file
- Internal Tiling: Optimised for spatial queries (256x256 or 512x512 tiles)
- Overviews: Pre-computed pyramids for fast visualisation at any zoom
- Cloud Storage: Works directly with S3, Azure, GCS—no local download
STANDARD OUTPUT
import rasterio
# Standard GeoTIFF write
with rasterio.open(
"output.tif", "w",
driver="GTiff",
height=data.shape[0],
width=data.shape[1],
count=1,
dtype=data.dtype,
crs=src.crs,
transform=src.transform
) as dst:
dst.write(data, 1)
# Problem: No internal tiling
# Problem: No overviews
# Problem: Must download entire file to readCLOUD-OPTIMIZED OUTPUT
import rasterio
from rasterio.enums import Resampling
# Cloud-Optimized GeoTIFF with COG driver
with rasterio.open(
"output_cog.tif", "w",
driver="COG",
height=data.shape[0],
width=data.shape[1],
count=1,
dtype=data.dtype,
crs=src.crs,
transform=src.transform,
compress="deflate", # Compression
blocksize=512, # Internal tile size
overview_resampling="average"
) as dst:
dst.write(data, 1)
# Or use rio-cogeo CLI:
# rio cogeo create input.tif output_cog.tifReading COGs from Cloud Storage
The real power: read only the portion of a massive raster you need, directly from S3 or Azure Blob Storage.
import rasterio
from rasterio.windows import from_bounds
# Read small window from 50GB COG on S3 (no full download!)
cog_url = "s3://bucket/large-raster.tif"
with rasterio.open(cog_url) as src:
# Define area of interest
window = from_bounds(
left=-122.5, bottom=37.5,
right=-122.0, top=38.0,
transform=src.transform
)
# Read only that window - range request!
subset = src.read(1, window=window)
print(f"Read {subset.nbytes / 1e6:.1f} MB instead of full file")Performance: Reading 10km area from 50GB global DEM: ArcPy downloads entire file (14 minutes). Rasterio COG range request: 2.3 seconds.
Memory Management for Large Rasters
Use windowed reading, Dask arrays, or xarray for rasters that exceed available RAM. Unlike ArcPy, which streams data through file-based operations, Rasterio loads data into memory by default. For 10GB+ rasters, you need different strategies.
THE MEMORY WALL
Example: Global 30m DEM (SRTM), 432,000 x 180,000 pixels, 150GB uncompressed. Calling src.read() tries to load everything—crashes with MemoryError.
ArcPy handles this transparently (using disk-based processing), but loses parallelisation benefits. Rasterio gives you control—you choose the trade-off.
Process rasters in spatial chunks (windows). Best for operations that don't need cross-tile context (reclassify, NDVI, simple algebra).
import rasterio
from rasterio.windows import Window
import numpy as np
with rasterio.open("huge_raster.tif") as src:
# Process in 1024x1024 windows
profile = src.profile.copy()
profile.update(compress='deflate')
with rasterio.open("output.tif", "w", **profile) as dst:
for ji, window in src.block_windows(1):
# Read chunk
data = src.read(1, window=window)
# Process chunk (example: normalize)
processed = (data - data.min()) / (data.max() - data.min())
# Write chunk
dst.write(processed.astype(np.float32), 1, window=window)Dask arrays enable lazy evaluation and parallel processing across chunks. Operations are only computed when you explicitly trigger them.
import dask.array as da
import rioxarray
# Open with rioxarray - creates lazy Dask-backed array
ds = rioxarray.open_rasterio("huge_raster.tif", chunks=(1, 2048, 2048))
# Operations are lazy - no computation yet
dem = ds.squeeze()
slope = np.arctan(np.sqrt(
np.gradient(dem, axis=0)**2 + np.gradient(dem, axis=1)**2
))
# Trigger computation with parallel execution
result = slope.compute() # Uses all CPU cores
# Or write directly (streams to disk)
slope.rio.to_raster("slope_output.tif", driver="COG")For time-series or multi-dimensional raster stacks (satellite imagery), xarray provides powerful abstractions for labeled arrays with Dask parallelisation.
import xarray as xr
import rioxarray
# Open time series of rasters with Dask chunking
files = ["landsat_2020_01.tif", "landsat_2020_02.tif", ...]
ds = xr.open_mfdataset(
files,
engine="rasterio",
chunks={"x": 2048, "y": 2048}
)
# Calculate temporal statistics (lazy)
mean_ndvi = ds['ndvi'].mean(dim='time')
max_ndvi = ds['ndvi'].max(dim='time')
# Write results
mean_ndvi.rio.to_raster("mean_ndvi.tif")Process rasters in spatial chunks (windows). Best for operations that don't need cross-tile context (reclassify, NDVI, simple algebra).
import rasterio
from rasterio.windows import Window
import numpy as np
with rasterio.open("huge_raster.tif") as src:
# Process in 1024x1024 windows
profile = src.profile.copy()
profile.update(compress='deflate')
with rasterio.open("output.tif", "w", **profile) as dst:
for ji, window in src.block_windows(1):
# Read chunk
data = src.read(1, window=window)
# Process chunk (example: normalize)
processed = (data - data.min()) / (data.max() - data.min())
# Write chunk
dst.write(processed.astype(np.float32), 1, window=window)Dask arrays enable lazy evaluation and parallel processing across chunks. Operations are only computed when you explicitly trigger them.
import dask.array as da
import rioxarray
# Open with rioxarray - creates lazy Dask-backed array
ds = rioxarray.open_rasterio("huge_raster.tif", chunks=(1, 2048, 2048))
# Operations are lazy - no computation yet
dem = ds.squeeze()
slope = np.arctan(np.sqrt(
np.gradient(dem, axis=0)**2 + np.gradient(dem, axis=1)**2
))
# Trigger computation with parallel execution
result = slope.compute() # Uses all CPU cores
# Or write directly (streams to disk)
slope.rio.to_raster("slope_output.tif", driver="COG")For time-series or multi-dimensional raster stacks (satellite imagery), xarray provides powerful abstractions for labeled arrays with Dask parallelisation.
import xarray as xr
import rioxarray
# Open time series of rasters with Dask chunking
files = ["landsat_2020_01.tif", "landsat_2020_02.tif", ...]
ds = xr.open_mfdataset(
files,
engine="rasterio",
chunks={"x": 2048, "y": 2048}
)
# Calculate temporal statistics (lazy)
mean_ndvi = ds['ndvi'].mean(dim='time')
max_ndvi = ds['ndvi'].max(dim='time')
# Write results
mean_ndvi.rio.to_raster("mean_ndvi.tif")| Raster Size | Operation Type | Recommended Approach |
|---|---|---|
| < 2GB | Any | Standard Rasterio (load fully) |
| 2-20GB | Per-pixel (no context) | Windowed reading |
| 2-20GB | Neighbourhood ops | Dask with overlap |
| 20-100GB | Complex analysis | rioxarray + Dask cluster |
| > 100GB | Any | Google Earth Engine or STAC + COG |
Writing Rasters: Best Practices
Always copy the source profile, update only what changes, and prefer COG format for outputs. The most common bugs come from mismatched metadata—wrong CRS, transform, or nodata value.
COMMON MISTAKES
# DON'T: Hardcode metadata
with rasterio.open(
"output.tif", "w",
driver="GTiff",
height=1000,
width=1000,
count=1,
dtype="float32",
crs="EPSG:4326", # Might not match input!
transform=Affine(0.1, 0, 0, 0, -0.1, 0) # Wrong!
) as dst:
dst.write(data, 1)
# Problems:
# - CRS might not match input
# - Transform is completely wrong
# - No compression (huge files)
# - nodata not setCORRECT APPROACH
# DO: Copy and update profile
with rasterio.open("input.tif") as src:
data = src.read(1)
profile = src.profile.copy()
# Process data
result = some_processing(data)
# Update only what changed
profile.update(
driver="COG", # Cloud-optimized
dtype=result.dtype, # Match output type
compress="deflate", # Compression
nodata=-9999 # Explicit nodata
)
with rasterio.open("output.tif", "w", **profile) as dst:
dst.write(result, 1)GDAL CONFIGURATION FOR CLOUD WRITES
Writing large COGs to cloud storage requires GDAL configuration. Set these environment variables before opening files:
import os
import rasterio
# Configure GDAL for S3
os.environ['AWS_REQUEST_PAYER'] = 'requester' # For requester-pays buckets
os.environ['GDAL_DISABLE_READDIR_ON_OPEN'] = 'YES' # Faster opens
os.environ['CPL_VSIL_CURL_ALLOWED_EXTENSIONS'] = '.tif,.tiff,.vrt'
# Now write directly to S3
with rasterio.open("s3://my-bucket/output.tif", "w", **profile) as dst:
dst.write(data, 1)Migration Timeline: What to Expect
A typical raster workflow migration takes 2-4 months, starting with a 1-week proof of concept. The timeline depends on workflow complexity, team familiarity with NumPy, and cloud integration requirements.
- Inventory all arcpy.sa scripts: list functions used, input/output formats
- Identify priority workflows: high runtime, frequent execution
- Build proof of concept: migrate one script end-to-end
- Benchmark: compare runtime, output accuracy, memory usage
- Convert input rasters to COG format (gdal_translate or rio-cogeo)
- Set up cloud storage structure (S3 buckets, folder conventions)
- Create validation scripts: compare outputs pixel-by-pixel
- Document CRS and nodata conventions for team
- Migrate scripts in priority order (highest ROI first)
- Implement memory management for large rasters
- Build reusable library: common operations, error handling
- Run parallel validation: ArcPy and Rasterio produce identical outputs
- Deploy migrated workflows to production
- Monitor performance: runtime, memory, cloud costs
- Train team on Rasterio patterns and debugging
- Decommission ArcPy workflows after 30-day confidence period
Complete Workflow Translation: Real Example
Here's a production workflow we migrated for an environmental monitoring client: calculate terrain derivatives from a 20GB DEM and identify areas exceeding slope thresholds.
ORIGINAL ARCPY VERSION (4.2 HOURS)
import arcpy
from arcpy.sa import *
arcpy.env.workspace = "C:/data"
arcpy.CheckOutExtension("Spatial")
# Read 20GB DEM
print("Reading DEM...")
dem = Raster("large_dem.tif")
# Calculate slope
print("Calculating slope...")
slope = Slope(dem, "DEGREE")
# Calculate aspect
print("Calculating aspect...")
aspect = Aspect(dem)
# Identify steep slopes (> 30 degrees)
print("Finding steep areas...")
steep_mask = Con(slope > 30, 1, 0)
# Calculate hillshade for visualisation
print("Creating hillshade...")
hillshade = Hillshade(dem, 315, 45)
# Save outputs
print("Saving results...")
slope.save("slope.tif")
aspect.save("aspect.tif")
steep_mask.save("steep_areas.tif")
hillshade.save("hillshade.tif")
print("Complete!")Runtime: 4.2 hours | Peak memory: 8GB | 20GB input DEM
MIGRATED RASTERIO VERSION (12 MINUTES)
import rioxarray as rxr
import xrspatial
import numpy as np
# Open with Dask chunking (lazy loading)
print("Opening DEM with lazy loading...")
dem = rxr.open_rasterio(
"s3://bucket/large_dem.tif",
chunks={"x": 2048, "y": 2048}
).squeeze()
# Calculate terrain derivatives (all lazy)
print("Setting up terrain calculations...")
slope = xrspatial.slope(dem)
aspect = xrspatial.aspect(dem)
hillshade = xrspatial.hillshade(dem, azimuth=315, angle_altitude=45)
# Identify steep areas
steep_mask = xr.where(slope > 30, 1, 0).astype(np.uint8)
# Write all outputs (parallel computation + streaming)
print("Computing and writing results...")
slope.rio.to_raster("s3://bucket/outputs/slope.tif", driver="COG")
aspect.rio.to_raster("s3://bucket/outputs/aspect.tif", driver="COG")
steep_mask.rio.to_raster("s3://bucket/outputs/steep_areas.tif", driver="COG")
hillshade.rio.to_raster("s3://bucket/outputs/hillshade.tif", driver="COG")
print("Complete!")Runtime: 12 minutes | Peak memory: 3GB | Same 20GB input, 21x faster
PERFORMANCE BREAKDOWN: WHERE TIME GOES
ArcPy (4.2 hours)
Rasterio + xrspatial (12 min)
Key insight: Lazy loading eliminates upfront read time. Parallel Dask execution uses all CPU cores. COG output streams directly to S3 without temp files.
Rasterio migration makes sense when you need cloud integration, parallel processing, or want to eliminate Spatial Analyst licensing costs.
The 5-20x performance improvement comes from vectorised NumPy operations and parallel Dask execution. The workflow improvement comes from COG format enabling cloud-native architectures.
However, if your workflows depend on ArcPy-specific functions like sophisticated hydrological modelling or you're locked into Esri infrastructure, a hybrid approach may work better—use Rasterio for data preparation and output, ArcPy for specialised analysis.
Frequently Asked Questions
Is Rasterio faster than ArcPy Spatial Analyst?
Yes, for most operations. Rasterio with NumPy provides 5-20x speedups for raster algebra due to vectorised operations. Reading Cloud-Optimized GeoTIFFs can be 10-50x faster than ArcPy because of HTTP range requests—you only download the pixels you need.
Can Rasterio read ArcGIS raster formats?
Yes, Rasterio reads all GDAL-supported formats including GeoTIFF, JPEG2000, MrSID, ESRI Grid, and File Geodatabase rasters. For best performance, convert to Cloud-Optimized GeoTIFF format which enables streaming from cloud storage without downloading entire files.
What replaces arcpy.sa Slope and Aspect functions?
Use np.gradient() for basic slope/aspect calculations, or libraries like xrspatial, richdem, or earthpy for pre-built terrain analysis functions. Performance is typically 5-10x faster than arcpy.sa equivalents due to vectorised computation.
How do I handle large rasters that don't fit in memory?
Use Rasterio's windowed reading to process in chunks, or xarray with Dask for automatic chunking and parallel execution. For very large datasets (100GB+), COG format enables reading only the spatial extent you need via HTTP range requests.
What is Cloud-Optimized GeoTIFF (COG) and why should I use it?
COG is a GeoTIFF variant optimised for cloud storage and HTTP range requests. It enables reading only portions of large rasters without downloading entire files. Rasterio natively supports COG for reading from and writing to cloud storage like S3 and Azure Blob—ArcPy has no equivalent capability.
Get Workflow Automation Insights
Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.
