Lesson 1 of 12 · The guided tour

Know the shape before you fill it.

You landed a working package in Lesson 0. Eight subdirectories, one real module, every __init__.py already labeled with what’s coming. This lesson is the guided tour: why each directory exists, what an editable install actually does, and how the daily loop — ruff → pytest → commit → CI green — will run every lesson from here on.

Time: ~40 min You'll touch: nothing new — you’ll read Result: a mental map of the whole course

· Objective

No new code this lesson. By the end of it you should be able to:

  • Walk every subdirectory of opentrash/ and say what it’s for — including the empty ones.
  • Explain what pip install -e ".[dev]" does and why the editable bit matters for everyday development.
  • Describe the dev loop — the cycle every subsequent lesson runs — without looking it up.
  • Know when to reach for [geotab] and [postgres] extras and why they’re optional.
Why a tour lesson? The package layout is doing real architectural work — engine/ sits where it does because every product consumes from it; cache/ is a top-level not a sub-folder because it’s the substrate. Skipping past the shape would mean writing code into folders you haven’t really seen yet. Forty minutes here saves hours later.

· The package, directory by directory

Open each __init__.py as we go — the docstrings are the canonical “what goes here” reference. Reading them in order is reading the architecture.

1 adapters/gps/ — where data comes from

The package consumes GPS data from more than one vendor (Geotab, Postgres streams). Both produce the same canonical schema; vendor-specific quirks are isolated to one file each. The shared interface is a small Protocol in base.py, so swapping vendors is a one-line change in a calling script.

Adapters are the only place vendor terminology appears. Everywhere else in the package you see vehicle_id, dt_local, lat, lon, speed_mph — not Geotab’s VehicleName or Postgres’s DateTime.

2 core/ — foundational primitives

The tiny shared pieces every other module imports: coordinate-reference-system constants (EPSG:2230 for working in feet, EPSG:4326 for web), a DuckDB session builder configured for the workload, and vehicle-ID parsing rules. If something feels like “a primitive every module needs,” it belongs here.

3 prep/ — one-time preparation of static data

The data that doesn’t stream — parcels, sites (customer accounts), route polygons, facilities (landfills, depots, other operational sites). Prep turns raw municipal inputs into compact, validated parquet layers the rest of the package consumes. The one module that exists today — prep/sites.py — sets the pattern: progressive cleaning, business rules encoded as data, nullable dtypes throughout.

4 cache/ — the substrate

One parquet per vehicle per local day, lookup indexes on top, and the master spatial-temporal lookup that joins vehicle-day bboxes to route bboxes. This is what makes “which GPS files touch route X on day Y” a millisecond query instead of a directory scan.

Cache is a top-level module, not a subdirectory of routeview or patterns. That placement is deliberate — the substrate feeds everything, so it sits at the top with everything else that’s foundational.

5 tonnage/ — ingestion of legacy tonnage data

Tonnage comes from a legacy system as messy Excel files, daily-ish. The tonnage modules turn that into partitioned parquet with a hash-based deduplication scheme, vehicle-ID parsing, and a lookup interface for “what tipped on day D from vehicle V.” Idempotent re-runs are mandatory — the same Excel file processed twice produces zero new rows.

6 engine/ — THE ROUTING ENGINE

The architectural heart of the package. Every GPS ping is joined to every applicable GIS layer (routes, parcels, facilities) once, producing enriched pings: a canonical stream every product consumes. engine/segments.py takes that stream and runs timeline analysis on top — service, travel, landfill, load, shift — using sequence and dwell rules.

The principle: spatial joins are infrastructure, products are calculations. The engine does the joins once; pattern detection, RouteView, and any future product is a pure aggregation over engine output. Products never do their own spatial joins. This is the single most important architectural rule in the package.

7 patterns/ — pattern detection (product)

Per-parcel service signatures: weekly1, weekly2, and biweekly — with vehicle, day-of-week, hour, and a composite biweekly score. A DuckDB CTAS pipeline that reads engine output and writes a single wide patterns table. No spatial joins of its own.

8 routeview/ — the interactive map (product)

One route, one day, an interactive MapLibre HTML. Built from engine output: ranked vehicles, the route-clipped trail, parcels served vs missed, tonnage tips. A pure consumer — rank is an aggregation, trail is a filter, render is template substitution.

· The dev loop

Every lesson from here on follows the same cycle. Internalize this once and the rest of the course is muscle memory.

1 Editable install — pip install -e ".[dev]"

The -e flag (“editable”) is the unsung hero of Python development. Instead of copying your package files into site-packages, it installs a link — so when you edit opentrash/prep/sites.py, the next python -c "from opentrash import ..." sees your change immediately. No reinstall, no rebuild.

The ".[dev]" part installs the package plus the optional dev dependencies declared in pyproject.toml — ruff and pytest. Add [geotab] or [postgres] too when you start working with those adapters.

2 Lint — ruff check .

Ruff is a fast Python linter that catches a lot of small problems before tests do. The default config in pyproject.toml enables a tight, useful rule set (errors, undefined names, import sorting, modernization, bug patterns) without going overboard. Fix or auto-fix (ruff check . --fix) until you see All checks passed!

3 Test — pytest -q

pytest discovers everything under tests/ matching test_*.py and runs it. The -q flag keeps output quiet unless something fails. Today this runs the 9 sites tests from Lesson 0; by the end of the course it will run a hundred or so.

Tests live in tests/ at the repo top level — the standard Python convention. tests/ is not inside .github/; .github/workflows/ci.yml is a GitHub Actions config that runs the tests, not where they live.

4 Commit — git commit -m "Lesson N: ..."

One lesson, one commit (or a small handful). The commit message names the lesson so the repo history reads like a course outline. The CI workflow runs automatically on push, runs ruff and pytest on a fresh Linux machine, and shows a green check on GitHub when it passes — the “works on my machine” problem solved.

· Optional extras — vendor isolation

Heavy or vendor-specific dependencies live in optional extras, not base. Why: the package should install cleanly for someone who only wants to read a parquet, without dragging in a 200 MB Postgres driver they’ll never use.

  • pip install -e ".[geotab]" — adds mygeotab and pytz for the Geotab API adapter.
  • pip install -e ".[postgres]" — adds psycopg2-binary and sqlalchemy for the streaming Postgres adapter.
  • pip install -e ".[dev,geotab,postgres]" — everything, for full local development.

Inside the adapter modules, the vendor library is imported inside the function that uses it — not at module top. That way someone who pip-installs the package without the [geotab] extra can still import opentrash without crashes; the Geotab adapter only fails if they actually try to use it.

· Package anatomy after this lesson

Unchanged from Lesson 0 — this lesson is the orientation tour, not new code. The shape we read together:

opentrash/ ├── pyproject.toml ├── README.md ├── .github/workflows/ci.yml ├── docs/architecture.md ├── opentrash/ │ ├── __init__.py │ ├── adapters/gps/ # where data comes from │ ├── core/ # foundational primitives │ ├── prep/ │ │ └── sites.py # the one real module today │ ├── cache/ # the substrate │ ├── tonnage/ # legacy-data ingestion │ ├── engine/ # THE ROUTING ENGINE │ ├── patterns/ # product: pattern detection │ └── routeview/ # product: interactive map └── tests/ └── test_sites.py

· What you should walk away with

  • A mental map of where each module lives and why.
  • The routing-engine rule — spatial joins are infrastructure, products are calculations.
  • The dev loop — editable install, ruff, pytest, commit, CI — you’ll run a version of this every lesson.
  • Why optional extras exist and when to opt into them.
  • Confidence that the empty directories aren’t empty by accident — each one is waiting for a specific lesson’s work.
The tour is done. The next lesson lands real code — the foundational primitives that everything else imports.

· Companion resources

Optional, for going deeper.

  • Editable installs: pip docs on local installs — what -e actually does under the hood.
  • Optional dependencies: pyproject.toml spec — the standard for [project.optional-dependencies].
  • typing.Protocol: Python docs — the structural-typing idea behind the adapter pattern we’ll use in Lesson 6.

· Next lesson

Lesson 2 — The foundation: CRS constants, the DuckDB session builder, and the parcels prep module. Three small, foundational pieces that every later lesson imports.