AI AGENTS SERIESPART 1 OF 3
Technical Deep-Dive

Beyond the Chatbot

Why generic LLMs fail at GIS automation and how dual-agent architectures solve the validation problem that makes single-agent systems dangerous for production workflows.

PUBLISHEDJAN 2026
SERIESAI AGENTS
AUTHORAXIS SPATIAL
Sumi-e ink painting of two cranes in a circular dance - representing the dual-agent architecture
  • Generic LLMs generate code that works locally but fails at scale—wrong CRS, memory explosions, platform-specific gotchas
  • Dual-agent architecture (Architect + Auditor) catches 94% of issues before deployment through cross-validation
  • Your data never leaves your environment—agents analyse workflow logic, not geodata
  • Best for repetitive execution workflows (8+ hours/week, 12+ executions/year). Novel analysis still needs humans.

Every few months, someone at a GIS conference demos a chatbot that can "talk to your maps." Type a question, get a heatmap. The audience applauds. The startup gets funded.

Then the pilot fails. Not because the technology is bad, but because it solves the wrong problem. Enterprises don't need another interface to their data. They need the work to get done.

The difference between a chatbot and an automation agent is the difference between a search engine and an assembly line. One helps you find things. The other produces output.

The Problem: Recurring Manual Effort

The real problem isn't any single tool. It's the recurring manual effort across disconnected steps that consumes your team's time week after week:

  • Opening ArcGIS Desktop and running a 12-step geoprocessing workflow—manually clicking through each tool, waiting, verifying outputs
  • Downloading satellite imagery from a portal, preprocessing in ENVI, exporting to three different formats, uploading to another system
  • Exporting data from one tool, reformatting in Excel because the schemas don't match, importing to another tool
  • Running the same QGIS analysis every week with updated data—same steps, same clicks, same waiting

The pattern is always the same: Download → Process → Analyse → Report. Manual download from data portal. Desktop preprocessing. GIS analysis. Excel reporting. PowerPoint compilation.

The arithmetic is brutal:

8 hours/week on one routine = 416 hours annually on the SAME task.

Processing 3 countries/year manually means 50 countries requires 17× more staff.

That's not a scaling challenge. That's a structural impossibility.

The instinct is to ask: "Can AI do this for me?"

The answer is yes—but not the way most vendors are selling it.

Why Generic LLMs Fail at GIS

Anyone can get Claude or GPT to generate geospatial code. The question is whether that code works in production, at scale, on YOUR platform, with YOUR constraints.

Generic LLMs fail at GIS automation for specific, predictable reasons:

1They don't understand coordinate systems

Geographic vs projected isn't just trivia—it determines whether your area calculations are in square degrees (meaningless) or square metres (useful). A generic LLM will happily calculate area in EPSG:4326 and return numbers that look plausible but are wrong by factors of 10-100× depending on latitude.

2They suggest approaches that fail at scale

"Load the GeoDataFrame and dissolve" works great on 10,000 features. On 10 million features, it triggers an OOM kill. The LLM doesn't know your dataset size, your cluster memory, or the difference between a pattern that works in a Jupyter notebook and one that survives in production.

3They ignore platform-specific constraints

Cloud object storage doesn't support seek operations. Databricks Volumes can't write GeoPackages directly. AWS Lambda has a 15-minute timeout. Azure Functions have different package limits. The LLM doesn't know which platform you're on, or what that platform can't do.

4They lack enterprise awareness

In regulated industries—insurance, utilities, finance—precision matters. A 0.001% difference in spatial calculations can cascade into significant business impact. Reproducibility is mandatory. Audit trails are required. The LLM generates code that works. It doesn't generate code that's auditable, reproducible, or compliant.

THE REAL ERROR WE SEE

GEOSException: IllegalArgumentException: Unhandled geometry type in CoverageUnion

Generic LLMs suggest coverage_union_all() for dissolves because it's 80× faster. What they don't know: it ONLY works with Polygon/MultiPolygon. If make_valid() returns a GeometryCollection with a LineString, the entire pipeline crashes.

This specific error took us 3 days to diagnose the first time. Now it takes 3 seconds. That's the difference between generic code generation and production-aware automation.

The Dual-Agent Architecture

The solution isn't smarter prompts. It's a different architecture.

Single-agent systems hallucinate. They generate plausible-looking code that fails in subtle ways. The failure mode is worse than obvious bugs—it's code that runs successfully but produces wrong results.

Dual-agent systems catch these failures through cross-validation. One agent proposes. Another agent validates. If the validator finds issues, the proposer rewrites. The loop continues until the code passes or escalates to a human.

┌─────────────────────────────────────────────────────────────┐
│                    WORKFLOW ORCHESTRATION                    │
│                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │  AGENT A     │───▶│  IMPLEMENT   │───▶│  AGENT B     │   │
│  │  Architect   │    │              │    │  Auditor     │   │
│  └──────────────┘    └──────────────┘    └──────────────┘   │
│        │                                        │            │
│        │         ┌──────────────┐               │            │
│        │         │   REWRITE    │◀──────────────┤ FAIL?      │
│        │         └──────────────┘               │            │
│        │              │                         │            │
│        └──────────────┴─────────────────────────┘            │
│                       │                                      │
│                       ▼                                      │
│                    [PASS]                                    │
│                       │                                      │
│                       ▼                                      │
│               ┌──────────────┐                               │
│               │ UPDATE DOCS  │                               │
│               └──────────────┘                               │
└─────────────────────────────────────────────────────────────┘

AThe Architect

Proposes modernisation paths. Maps legacy ArcPy logic to cloud-native equivalents. Generates the code that will be validated.

  • Searches error registry for past failures
  • Reviews domain-specific documentation
  • Checks git history for why current code exists
  • Outputs risk assessment: HIGH/MEDIUM/LOW

BThe Auditor

Validates generated code against known constraints. Catches the failures before they reach production.

  • Syntax check: python -m py_compile
  • Import verification: all dependencies resolve
  • Platform constraint checking (Volumes, memory)
  • Error pattern detection against documented failures

The Self-Sustaining Loop

Max 3 iterations before escalating to a human. If the Auditor fails the code three times, a human reviews. This prevents infinite loops and ensures genuinely difficult problems get human attention.

After successful changes, documentation gets updated automatically. New errors get added to the registry. The system learns from every deployment, making future validations more accurate.

In our production deployments, this architecture catches 94% of issues before they reach human review. The remaining 6% are genuinely novel problems that require expert judgment—exactly where human time should be spent.

Your Data Never Leaves

The enterprise sales objection we hear most often: "We can't send our data to a third party."

They're right to be cautious. And they're solving the wrong problem.

Our agents don't need your data. They need your workflow logic. The difference is critical:

SaaS ModelAgent Model
"Upload your geodata to our platform""We never see your geodata"
12-18 month security audit cycleRuns in your VPC from day one
Pay markup on computeUse credits you already paid for
Move petabytes to vendorBring automation to your data

This is what we call "Compute-to-Data" architecture. Instead of moving your petabytes to us, we deploy containerised agents directly into your Databricks, Snowflake, or cloud environment.

The agents analyse your ArcPy scripts, your geoprocessing model exports, your workflow documentation. They propose modernised equivalents. They validate that the new code produces identical outputs to the old code.

At no point do they see your actual data. They see the logic that processes your data.

The Tech Stack Shift

Automation isn't about replacing ArcPy with GeoPandas. It's about replacing patterns that don't scale with patterns that do.

Before (Legacy)After (Cloud-Native)Why It Matters
Desktop files (.gdb, .shp)Cloud-native formats (GeoParquet, COG)Parallel access, no file locks
Manual downloadsAutomated data discovery (STAC)Pipelines find their own inputs
Sequential processingDistributed compute (Spark, Dask)Scale with data volume
Visual QAAutomated validation rulesCatch issues before humans see them
Analyst-driven executionSchedule-driven pipelinesRuns while you sleep

The insight is that these patterns work regardless of your specific platform. Databricks, AWS, Azure, GCP—the architectural principles are the same. The implementation details differ.

Our agents understand both. They generate code that follows cloud-native patterns and respects your platform's specific constraints.

Patterns from Production

These patterns come from processing country-scale geospatial data for insurance risk assessment. The errors are real. The solutions work.

Memory Management: Pass Bounds, Not Data

The Problem: Processing large geographic areas (populous countries, dense urban regions) crashes with OOM even on powerful clusters. The issue isn't compute—it's data architecture.

The Pattern: Pass references (bounding boxes, file paths), not data. Let downstream processes load only what they need.

# WRONG - reads 3.6 GB just to find bounding box
buffer_data = src.read(1)
extent = np.where(buffer_data > 0)

# CORRECT - zero memory cost
buffer_bounds = gdf.total_bounds

This single change reduced memory usage from 3.6 GB to effectively zero.

Cloud Storage: Two-Stage Write

The Problem: Cloud object storage (S3, Azure Blob, Databricks Volumes) doesn't support random seek operations. Formats that require seek (GeoPackage/SQLite, GeoTIFF) fail with cryptic errors.

CPLE_AppDefinedError: _tiffSeekProc: Operation not supported
sqlite3_exec failed: disk I/O error

The Pattern: Two-stage write. Process to local filesystem, then stream-copy to cloud storage.

This is tribal knowledge that takes months to discover when you don't know it. The error messages don't tell you what's wrong.

Geometry Validation: Before, Not After

The Problem: Topology exceptions crash spatial unions. Real-world data (OSM, government sources) contains self-intersecting polygons, ring orientation issues, and other invalid geometries.

The Pattern: Validate ALL geometries before aggregate operations. Use make_valid(method='structure')—it's 2.7× faster than the alternatives and doesn't lose data like buffer(0).

This eliminates 90%+ of pipeline failures we see in the wild.

When AI Agents Aren't the Answer

We turn away 20-30% of enquiries. Not because we can't help, but because automation won't deliver the ROI they're expecting.

Here's when AI agents are the wrong tool:

Novel Analytical Work

If every execution requires expert interpretation at 15 decision points, automation cannot replicate that judgment. Agents handle repetitive execution. Humans handle novel analysis.

Undocumented Workflows

If the workflow exists only in one analyst's head, automation requires documentation first. We can help with that—but budget 4-6 additional weeks for process formalisation before automation begins.

Low-Frequency, Low-Stakes

Workflows executed once per year with minimal downstream impact don't justify automation investment. The math doesn't work. Exception: if manual execution creates key-person dependency that threatens business continuity.

Better Replaced Than Migrated

Some legacy systems shouldn't be migrated. They should be replaced. If the underlying logic is fundamentally flawed, automating it just produces wrong answers faster. We'll tell you if this is your situation.

The Sweet Spot

Automation works best for workflows that are:

  • Repetitive: 8+ hours/week on the same task
  • Frequent: 12+ executions/year minimum
  • Documented: Steps are formalised (even partially)
  • Execution-heavy: More clicking than thinking

AI agents for GIS aren't chatbots with map skills. They're automation engines that do the repetitive work your team shouldn't be doing manually.

The dual-agent architecture—Architect proposing, Auditor validating—catches the failures that make single-agent systems dangerous for production. Your data never leaves your environment. The agents analyse logic, not geodata.

This isn't magic. It's engineering. The patterns are known. The constraints are documented. The validation is automated.

The question isn't whether AI can automate your workflows. It's whether your workflows are ready for automation—and whether automation is the right investment for your specific situation.

Sometimes it is. Sometimes it isn't. We'll tell you which.

Part 1 of 3
VIEW FULL SERIES

Get Workflow Automation Insights

Monthly tips on automating GIS workflows, open-source tools, and lessons from enterprise deployments. No spam.

NEXT STEP

Is Your Workflow Ready for Automation?

8 questions. 5 minutes. Get a personalised assessment of which workflows are automation-ready and which need other interventions first.