Routes and facilities — the GIS world the trucks live in.
GPS pings are only meaningful relative to the operational layers they move through — the routes a truck is supposed to drive, the landfills it tips at, the depot it starts and ends the day in. This lesson lands all of them in one module, parameterized so any agency’s data can plug in.
· Objective
One module, three functions, one config dataclass — serving every downstream lesson that needs to know “where the operations happen.”
- Routes — load AUTO and HTC polygons, dissolve, optionally buffer HTC, and reduce to per-route bounding boxes in EPSG:4326 (the format the engine’s spatial-temporal index expects).
- Facilities — one source file holding the depot and the landfills together; split them by a name column, apply an optional buffer to landfills, and hand back two GeoDataFrames keyed by role.
- Parameterized — column names and depot identifier are knobs, not hardcoded. Same code works for the next city.
· Build it, step by step
1 Route loading — load_route_polygons()
Read a route layer (Shapefile / GeoPackage / GeoParquet), drop empty geometries, repair invalid ones via shapely.validation.make_valid, and return a clean two-column GeoDataFrame in EPSG:4326: route_id (string) and geometry. The function takes the source-specific ID column name as an argument — common in city data: REF_ROUTE_, ROUTE_ID, etc.
The function refuses files with no CRS — same pattern as load_parcels from Lesson 2. Coordinates without a CRS are ambiguous; better to fail loud than silently produce wrong distances later.
2 Buffering and bboxes — buffer_in_working_crs() and route_bboxes()
Two small utilities that anchor the bbox-index pattern:
buffer_in_working_crs(gdf, buffer_ft)— buffer in feet, always doing the math inWORKING_CRS(EPSG:2230) regardless of the input CRS. Round-trips through the working CRS and back;gdf.geometry.buffer(50)in EPSG:4326 would silently give you 50° of buffering — a quarter of the planet — not 50 feet.route_bboxes(routes)— reduce route polygons to one bbox perroute_idin EPSG:4326. Multi-polygon routes (a route that spans several disjoint chunks) dissolve to the encompassing bbox. A small ~100 ft edge buffer is added in lat-degree units so a ping exactly on a boundary still matches; bbox math is cheap.
3 The route index — build_route_index()
The end-to-end entry point for the engine’s spatial-temporal index:
build_route_index(
auto_path="static_layers/auto_routes.shp",
htc_path="static_layers/htc_routes.shp",
auto_route_col="REF_ROUTE_",
htc_route_col="REF_ROUTE_",
htc_buffer_ft=150,
)
Loads both layers. Routes appearing in both AUTO and HTC keep their AUTO geometry — AUTO takes precedence because it’s already engulfing. Routes present only in HTC get the configured buffer applied. The result is a single DataFrame with one row per route_id: bbox in EPSG:4326.
4 Facilities — load_facilities() and FacilitiesConfig
The facilities layer is one file holding the depot and the landfills together. The depot is identified by a row whose name column matches a configured value — every other row is a landfill:
@dataclass(frozen=True)
class FacilitiesConfig:
name_col: str = "Name"
depot_name: str = "Miramar Place"
landfill_buffer_ft: float = 50.0 # tips happen on/just-outside the polygon
The defaults match a real San Diego dataset; override name_col and depot_name for any other agency. Landfills get the configured buffer (so a truck near the tip but not exactly on the polygon still registers as tipping); the depot polygon is kept exactly as drawn. Output is a dict: {"landfill": GeoDataFrame, "depot": GeoDataFrame}.
Why a dict and not a single dataframe with a kind column? Because the consumers (the routing engine, eventually) treat depots and landfills as distinct concepts — trucks start and end at a depot but tip at a landfill. Different semantics, different downstream rules. Keeping them apart in the data type makes the intent obvious.
5 Tests with synthetic data
Twelve tests covering the route bbox math (dissolve, edge buffer, buffer-in-feet), the AUTO + HTC merge logic (HTC-only included with buffer, AUTO wins when both exist), and the facilities split (depot found by name, landfills buffered, custom config for other agencies, missing-depot raises). All synthetic geometries, no real data needed:
pip install -e ".[dev]"
ruff check .
pytest -q
6 Commit and push
git add .
git commit -m "Lesson 3: prep/static_layers — routes + facilities"
git push origin main
The operational world is in. The package now knows where routes run and where trucks tip.
· Package anatomy after this lesson
Where everything lives now. new marks files added this lesson.
· What you built
- Route loading with the AUTO + HTC convention and the buffer rule that compensates for HTC geometry.
- A route bbox index — one row per route in EPSG:4326 — the format the engine’s spatial-temporal join consumes.
- Facilities loading with the depot-by-name split and an optional landfill buffer.
- A parameterized config so the next agency’s column names just plug in.
- Twelve tests over synthetic geometries covering the dissolve, buffer, and split logic.
· Companion resources
Optional, for going deeper.
shapely.validation.make_valid: docs — the polite way to handle slightly-broken polygons from real GIS exports.- Geopandas
to_crsperformance: reprojecting a large layer once is cheap; doing it inside a loop is expensive. Reproject once, work in the result. - Dissolve-by-attribute pattern: the routes case (multiple polygons sharing a
route_id→ one encompassing bbox) generalizes to any “multipart features” problem.
· Next lesson
Lesson 4 — Sites: a deeper pass on customer accounts — the rows that bridge parcels (what gets serviced) to billing (who pays for it). Integration patterns with parcels and routes.