Lesson 3 of 12 · The operational world

Routes and facilities — the GIS world the trucks live in.

GPS pings are only meaningful relative to the operational layers they move through — the routes a truck is supposed to drive, the landfills it tips at, the depot it starts and ends the day in. This lesson lands all of them in one module, parameterized so any agency’s data can plug in.

Time: ~75 min You'll touch: prep/static_layers Result: routes & facilities ready for the engine

· Objective

One module, three functions, one config dataclass — serving every downstream lesson that needs to know “where the operations happen.”

Routes — load AUTO and HTC polygons, dissolve, optionally buffer HTC, and reduce to per-route bounding boxes in EPSG:4326 (the format the engine’s spatial-temporal index expects).
Facilities — one source file holding the depot and the landfills together; split them by a name column, apply an optional buffer to landfills, and hand back two GeoDataFrames keyed by role.
Parameterized — column names and depot identifier are knobs, not hardcoded. Same code works for the next city.

The HTC buffer rule. AUTO routes are large engulfing polygons that already cover the streets a truck drives. HTC routes (Hard-to-Collect, manual pickup) are parcel-shape copies — a truck driving down the street next to an HTC parcel can fall outside the polygon. The fix: buffer HTC routes by ~150 ft (in the working CRS) so they behave like AUTO routes for spatial joins. This is one of those domain rules that’s invisible until you hit missing hits in production.

· Build it, step by step

1 Route loading — `load_route_polygons()`

Read a route layer (Shapefile / GeoPackage / GeoParquet), drop empty geometries, repair invalid ones via shapely.validation.make_valid, and return a clean two-column GeoDataFrame in EPSG:4326: route_id (string) and geometry. The function takes the source-specific ID column name as an argument — common in city data: REF_ROUTE_, ROUTE_ID, etc.

The function refuses files with no CRS — same pattern as load_parcels from Lesson 2. Coordinates without a CRS are ambiguous; better to fail loud than silently produce wrong distances later.

2 Buffering and bboxes — `buffer_in_working_crs()` and `route_bboxes()`

Two small utilities that anchor the bbox-index pattern:

buffer_in_working_crs(gdf, buffer_ft) — buffer in feet, always doing the math in WORKING_CRS (EPSG:2230) regardless of the input CRS. Round-trips through the working CRS and back; gdf.geometry.buffer(50) in EPSG:4326 would silently give you 50° of buffering — a quarter of the planet — not 50 feet.
route_bboxes(routes) — reduce route polygons to one bbox per route_id in EPSG:4326. Multi-polygon routes (a route that spans several disjoint chunks) dissolve to the encompassing bbox. A small ~100 ft edge buffer is added in lat-degree units so a ping exactly on a boundary still matches; bbox math is cheap.

3 The route index — `build_route_index()`

The end-to-end entry point for the engine’s spatial-temporal index:

build_route_index(
    auto_path="static_layers/auto_routes.shp",
    htc_path="static_layers/htc_routes.shp",
    auto_route_col="REF_ROUTE_",
    htc_route_col="REF_ROUTE_",
    htc_buffer_ft=150,
)

Loads both layers. Routes appearing in both AUTO and HTC keep their AUTO geometry — AUTO takes precedence because it’s already engulfing. Routes present only in HTC get the configured buffer applied. The result is a single DataFrame with one row per route_id: bbox in EPSG:4326.

4 Facilities — `load_facilities()` and `FacilitiesConfig`

The facilities layer is one file holding the depot and the landfills together. The depot is identified by a row whose name column matches a configured value — every other row is a landfill:

@dataclass(frozen=True)
class FacilitiesConfig:
    name_col: str = "Name"
    depot_name: str = "Miramar Place"
    landfill_buffer_ft: float = 50.0   # tips happen on/just-outside the polygon

The defaults match a real San Diego dataset; override name_col and depot_name for any other agency. Landfills get the configured buffer (so a truck near the tip but not exactly on the polygon still registers as tipping); the depot polygon is kept exactly as drawn. Output is a dict: {"landfill": GeoDataFrame, "depot": GeoDataFrame}.

Why a dict and not a single dataframe with a kind column? Because the consumers (the routing engine, eventually) treat depots and landfills as distinct concepts — trucks start and end at a depot but tip at a landfill. Different semantics, different downstream rules. Keeping them apart in the data type makes the intent obvious.

5 Tests with synthetic data

Twelve tests covering the route bbox math (dissolve, edge buffer, buffer-in-feet), the AUTO + HTC merge logic (HTC-only included with buffer, AUTO wins when both exist), and the facilities split (depot found by name, landfills buffered, custom config for other agencies, missing-depot raises). All synthetic geometries, no real data needed:

pip install -e ".[dev]"
ruff check .
pytest -q

6 Commit and push

git add .
git commit -m "Lesson 3: prep/static_layers — routes + facilities"
git push origin main

The operational world is in. The package now knows where routes run and where trucks tip.

· Package anatomy after this lesson

Where everything lives now. new marks files added this lesson.

opentrash/ ├── pyproject.toml ├── README.md ├── .github/workflows/ci.yml ├── docs/architecture.md ├── opentrash/ │ ├── __init__.py │ ├── adapters/gps/ # scaffolded │ ├── core/ │ │ ├── crs.py │ │ └── duckdb_session.py │ ├── prep/ │ │ ├── sites.py │ │ ├── parcels.py │ │ └── static_layers.py # [new] routes + facilities │ ├── cache/ # scaffolded │ ├── tonnage/ # scaffolded │ ├── engine/ # scaffolded — THE ROUTING ENGINE │ ├── patterns/ # scaffolded │ └── routeview/ # scaffolded └── tests/ ├── test_sites.py ├── test_crs.py ├── test_duckdb_session.py ├── test_parcels.py └── test_static_layers.py # [new]

· What you built

Route loading with the AUTO + HTC convention and the buffer rule that compensates for HTC geometry.
A route bbox index — one row per route in EPSG:4326 — the format the engine’s spatial-temporal join consumes.
Facilities loading with the depot-by-name split and an optional landfill buffer.
A parameterized config so the next agency’s column names just plug in.
Twelve tests over synthetic geometries covering the dissolve, buffer, and split logic.

The operational world is now data. Next lesson goes deeper on sites — the customer accounts that sit on top of parcels — the bridge between who pays for service and what gets serviced.

· Companion resources

Optional, for going deeper.

shapely.validation.make_valid: docs — the polite way to handle slightly-broken polygons from real GIS exports.
Geopandas to_crs performance: reprojecting a large layer once is cheap; doing it inside a loop is expensive. Reproject once, work in the result.
Dissolve-by-attribute pattern: the routes case (multiple polygons sharing a route_id → one encompassing bbox) generalizes to any “multipart features” problem.

· Next lesson

Lesson 4 — Sites: a deeper pass on customer accounts — the rows that bridge parcels (what gets serviced) to billing (who pays for it). Integration patterns with parcels and routes.

Back to all lessons

Routes and facilities — the GIS world the trucks live in.

· Objective

· Build it, step by step

1 Route loading — load_route_polygons()

2 Buffering and bboxes — buffer_in_working_crs() and route_bboxes()

3 The route index — build_route_index()

4 Facilities — load_facilities() and FacilitiesConfig

5 Tests with synthetic data

6 Commit and push

· Package anatomy after this lesson

· What you built

· Companion resources

· Next lesson

1 Route loading — `load_route_polygons()`

2 Buffering and bboxes — `buffer_in_working_crs()` and `route_bboxes()`

3 The route index — `build_route_index()`

4 Facilities — `load_facilities()` and `FacilitiesConfig`