Three small modules every later lesson imports.
You land three foundational primitives this lesson: the coordinate-reference-system constants and helpers; a DuckDB session builder with the spatial extension loaded; and the parcels prep module with a fast bounding-box prefilter. None of these are flashy on their own. All of them get imported dozens of times across the rest of the course.
· Objective
Land three modules and tests for each. Every later lesson assumes these exist.
core/crs.py—WORKING_CRS(EPSG:2230, feet) andWEB_CRS(EPSG:4326, lon/lat) constants, plusto_working()/to_web()reprojection helpers.core/duckdb_session.py—connect()returns a DuckDB connection with thespatialextension loaded (and installed on first use).prep/parcels.py—load_parcels(),bbox_filter(),bbox_from_geometry(). The parcel layer is the first real domain object.
· Build it, step by step
1 Coordinate reference systems — core/crs.py
Two constants and two helpers. The constants name the CRSs the package will use everywhere; the helpers reproject a GeoDataFrame in or out:
WORKING_CRS = "EPSG:2230" # CA State Plane Zone 6 (NAD83), US survey feet
WEB_CRS = "EPSG:4326" # WGS84 lon/lat for web maps
def to_working(gdf): ... # reproject to WORKING_CRS
def to_web(gdf): ... # reproject to WEB_CRS
The helpers refuse to operate on a GeoDataFrame with no CRS — coordinates without a CRS are ambiguous and any operation on them is unreliable. Better to fail fast with a clear message than to silently produce wrong distances.
If you’re adapting the package for a different region, change WORKING_CRS to the appropriate projected CRS for your area — epsg.io is the canonical lookup. WGS84 stays at EPSG:4326 universally.
2 DuckDB session — core/duckdb_session.py
DuckDB is the package’s compute engine: serverless, runs in-process, reads parquet straight off disk. The spatial extension adds ST_* functions and geometry types so we can do spatial work in SQL when that’s the fastest path.
connect() opens a connection, tries LOAD spatial first (fast if the extension is already installed), and falls back to INSTALL spatial; LOAD spatial on first use. After this lesson, anything that needs DuckDB calls connect() and gets a session that’s ready for spatial SQL.
from opentrash.core.duckdb_session import connect
con = connect()
row = con.execute("SELECT ST_AsText(ST_Point(1.0, 2.0))").fetchone()
# -> ('POINT (1.0 2.0)',)
3 Parcels — prep/parcels.py
A parcel is a property lot with a boundary polygon. In waste operations parcels are how we reason about what should be served. Three functions:
load_parcels(path)— read a vector file (GeoParquet, GeoPackage, etc.), ensure it’s inWORKING_CRS, return a GeoDataFrame. Refuses files with no CRS.bbox_filter(gdf, minx, miny, maxx, maxy)— the fast first pass. GeoPandas’s spatial-index-backedcxindexer narrows to parcels that intersect a bounding box without doing exact-geometry math. Use this before expensive intersects/within operations on the much smaller result.bbox_from_geometry(geom, buffer=0)— produce a bounding-box tuple from a geometry, optionally padded by a buffer (feet, in the working CRS). Pair this withbbox_filterfor “parcels within N feet of this route.”
The bbox prefilter is the core performance pattern that will repeat across the package — for routes (Lesson 3), for the GPS-to-parcel join (Lesson 8), for everything that needs “narrow down before doing expensive work.”
4 Tests with synthetic data
One test file per module, all using small synthetic geometries (a few boxes in San Diego, no real data needed):
tests/test_crs.py— constants are correct; reprojection round-trips; no-CRS input raises.tests/test_duckdb_session.py—connect()works; spatial functions are available. (These skip cleanly in environments where the extension can’t be downloaded.)tests/test_parcels.py— load round-trips; bbox_filter narrows correctly; no-CRS file raises; bbox_from_geometry math is right.
pip install -e ".[dev]"
ruff check .
pytest -q
5 Commit and push
git add .
git commit -m "Lesson 2: core/crs, core/duckdb_session, prep/parcels"
git push origin main
CI runs ruff + pytest on the push. Green check means the foundation is solid.
· Package anatomy after this lesson
Where everything lives now. new marks files added this lesson.
· What you built
- CRS constants and helpers — the convention “work in feet, render in degrees” is now encoded as data.
- A DuckDB session builder with the spatial extension loaded on first use.
- A parcels prep module with the bbox-prefilter pattern that the rest of the package will reuse.
- Tests for all of it — with graceful skips in environments where the spatial extension can’t be downloaded.
· Companion resources
Optional, for going deeper.
- EPSG:2230 details: epsg.io/2230 — the projected CRS for the San Diego region (use the right one for yours).
- DuckDB spatial extension: official docs — geometry types, ST_* functions, and how DuckDB’s spatial layer compares to PostGIS.
- GeoPandas spatial index: the
.cxindexer uses an R-tree under the hood. GeoPandas indexing docs.
· Next lesson
Lesson 3 — Routes & facilities: load the route polygons (AUTO + HTC) and the facilities layer (landfills, depots, other org sites). Buffer the HTC routes to compensate for their parcel-shape geometry. The operational world starts to take shape.