Lesson 2 of 12 · The foundation

Three small modules every later lesson imports.

You land three foundational primitives this lesson: the coordinate-reference-system constants and helpers; a DuckDB session builder with the spatial extension loaded; and the parcels prep module with a fast bounding-box prefilter. None of these are flashy on their own. All of them get imported dozens of times across the rest of the course.

Time: ~60 min You'll touch: core/crs · core/duckdb_session · prep/parcels Result: the substrate every later module sits on

· Objective

Land three modules and tests for each. Every later lesson assumes these exist.

  • core/crs.pyWORKING_CRS (EPSG:2230, feet) and WEB_CRS (EPSG:4326, lon/lat) constants, plus to_working() / to_web() reprojection helpers.
  • core/duckdb_session.pyconnect() returns a DuckDB connection with the spatial extension loaded (and installed on first use).
  • prep/parcels.pyload_parcels(), bbox_filter(), bbox_from_geometry(). The parcel layer is the first real domain object.
One convention this lesson establishes: work in a projected CRS (feet) for anything involving distances or spatial joins, and reproject to lon/lat only at the rendering boundary. Doing the math in degrees gives you nonsense distances; doing it in feet keeps everything correct and fast.

· Build it, step by step

1 Coordinate reference systems — core/crs.py

Two constants and two helpers. The constants name the CRSs the package will use everywhere; the helpers reproject a GeoDataFrame in or out:

WORKING_CRS = "EPSG:2230"   # CA State Plane Zone 6 (NAD83), US survey feet
WEB_CRS     = "EPSG:4326"   # WGS84 lon/lat for web maps

def to_working(gdf): ...    # reproject to WORKING_CRS
def to_web(gdf):     ...    # reproject to WEB_CRS

The helpers refuse to operate on a GeoDataFrame with no CRS — coordinates without a CRS are ambiguous and any operation on them is unreliable. Better to fail fast with a clear message than to silently produce wrong distances.

If you’re adapting the package for a different region, change WORKING_CRS to the appropriate projected CRS for your area — epsg.io is the canonical lookup. WGS84 stays at EPSG:4326 universally.

2 DuckDB session — core/duckdb_session.py

DuckDB is the package’s compute engine: serverless, runs in-process, reads parquet straight off disk. The spatial extension adds ST_* functions and geometry types so we can do spatial work in SQL when that’s the fastest path.

connect() opens a connection, tries LOAD spatial first (fast if the extension is already installed), and falls back to INSTALL spatial; LOAD spatial on first use. After this lesson, anything that needs DuckDB calls connect() and gets a session that’s ready for spatial SQL.

from opentrash.core.duckdb_session import connect

con = connect()
row = con.execute("SELECT ST_AsText(ST_Point(1.0, 2.0))").fetchone()
# -> ('POINT (1.0 2.0)',)

3 Parcels — prep/parcels.py

A parcel is a property lot with a boundary polygon. In waste operations parcels are how we reason about what should be served. Three functions:

  • load_parcels(path) — read a vector file (GeoParquet, GeoPackage, etc.), ensure it’s in WORKING_CRS, return a GeoDataFrame. Refuses files with no CRS.
  • bbox_filter(gdf, minx, miny, maxx, maxy) — the fast first pass. GeoPandas’s spatial-index-backed cx indexer narrows to parcels that intersect a bounding box without doing exact-geometry math. Use this before expensive intersects/within operations on the much smaller result.
  • bbox_from_geometry(geom, buffer=0) — produce a bounding-box tuple from a geometry, optionally padded by a buffer (feet, in the working CRS). Pair this with bbox_filter for “parcels within N feet of this route.”

The bbox prefilter is the core performance pattern that will repeat across the package — for routes (Lesson 3), for the GPS-to-parcel join (Lesson 8), for everything that needs “narrow down before doing expensive work.”

4 Tests with synthetic data

One test file per module, all using small synthetic geometries (a few boxes in San Diego, no real data needed):

  • tests/test_crs.py — constants are correct; reprojection round-trips; no-CRS input raises.
  • tests/test_duckdb_session.pyconnect() works; spatial functions are available. (These skip cleanly in environments where the extension can’t be downloaded.)
  • tests/test_parcels.py — load round-trips; bbox_filter narrows correctly; no-CRS file raises; bbox_from_geometry math is right.
pip install -e ".[dev]"
ruff check .
pytest -q

5 Commit and push

git add .
git commit -m "Lesson 2: core/crs, core/duckdb_session, prep/parcels"
git push origin main

CI runs ruff + pytest on the push. Green check means the foundation is solid.

· Package anatomy after this lesson

Where everything lives now. new marks files added this lesson.

opentrash/ ├── pyproject.toml ├── README.md ├── .github/workflows/ci.yml ├── docs/architecture.md ├── opentrash/ │ ├── __init__.py │ ├── adapters/gps/ # scaffolded │ ├── core/ │ │ ├── crs.py # [new] WORKING_CRS, WEB_CRS, to_working, to_web │ │ └── duckdb_session.py # [new] connect() with spatial extension │ ├── prep/ │ │ ├── sites.py │ │ └── parcels.py # [new] load_parcels, bbox_filter, bbox_from_geometry │ ├── cache/ # scaffolded │ ├── tonnage/ # scaffolded │ ├── engine/ # scaffolded — THE ROUTING ENGINE │ ├── patterns/ # scaffolded │ └── routeview/ # scaffolded └── tests/ ├── test_sites.py ├── test_crs.py # [new] ├── test_duckdb_session.py # [new] └── test_parcels.py # [new]

· What you built

  • CRS constants and helpers — the convention “work in feet, render in degrees” is now encoded as data.
  • A DuckDB session builder with the spatial extension loaded on first use.
  • A parcels prep module with the bbox-prefilter pattern that the rest of the package will reuse.
  • Tests for all of it — with graceful skips in environments where the spatial extension can’t be downloaded.
The foundation is in place. Next lesson layers in the static operational world — routes and facilities — on top of it.

· Companion resources

Optional, for going deeper.

  • EPSG:2230 details: epsg.io/2230 — the projected CRS for the San Diego region (use the right one for yours).
  • DuckDB spatial extension: official docs — geometry types, ST_* functions, and how DuckDB’s spatial layer compares to PostGIS.
  • GeoPandas spatial index: the .cx indexer uses an R-tree under the hood. GeoPandas indexing docs.

· Next lesson

Lesson 3 — Routes & facilities: load the route polygons (AUTO + HTC) and the facilities layer (landfills, depots, other org sites). Buffer the HTC routes to compensate for their parcel-shape geometry. The operational world starts to take shape.