Lesson 0 of 12 · The on-ramp

One notebook function, a real package, and the whole map.

Most "build a Python package" tutorials stop at setup.py. We're going to go a step further today: build a package whose shape reflects what it will eventually do. Eight subdirectories, one real module, every __init__.py labeled with what's coming. By the end of this lesson, anyone who opens the repo can see the entire course at a glance.

Time: ~60 min You'll touch: pyproject.toml · prep/sites · tests/ · CI Result: a working installable package with the full architecture mapped

· Objective

Stand up the opentrash package skeleton, land one real working module (prep/sites.py) translated from a notebook, and put it under CI — all in one sitting.

  • Scaffold the package with the destination architecture. Every subdirectory has a docstring listing what's planned. The course is the map.
  • Translate the notebook’s sites-cleaning logic into prep/sites.py with proper typed functions, docstrings, and tests.
  • Set up pyproject.toml, ruff, pytest, and a GitHub Actions CI workflow so the package is real from commit one.
The architectural through-line. opentrash has one structural rule: the routing engine joins GPS pings to GIS layers once, and every product (patterns, RouteView, future tools) consumes engine output. We're not building that today — but the folder for it (engine/) is already there, with a docstring telling you what’s coming. Lesson 0’s job is to land the shape.

· Before you begin

  • Python 3.11 or 3.12.
  • A GitHub account, with git on your machine.
  • You don’t need any data files yet — the tests use synthetic rows.

· Build it, step by step

1 Scaffold the package

Create the directory layout and an __init__.py in each subpackage. Every subdirectory gets a docstring that lists what’s coming — this is the part most tutorials skip and it’s the part that makes the package feel coherent from day one.

opentrash/
├── adapters/gps/   vendor-specific GPS sources (Geotab, Postgres)
├── core/           CRS, DuckDB session, vehicle IDs
├── prep/           parcels, sites, static layers (routes/facilities)
├── cache/          GPS cache + spatial-temporal indexes
├── tonnage/        Excel → parquet ingestion + lookup
├── engine/         THE ROUTING ENGINE
├── patterns/       pattern detection (consumes engine output)
└── routeview/      single-route/single-day map (consumes engine output)

Look at opentrash/engine/__init__.py right now. It’s a docstring that says “here’s what the routing engine will do.” The audience for your code — future you, contributors, learners — will read that and instantly understand the architecture. That’s free engineering documentation, and it costs about 30 seconds to write.

2 Land the first real module — prep/sites.py

A site is a serviced address. The city’s export carries ~29 columns of mixed quality; this module turns it into a clean validated table with canonical route IDs for each commodity (refuse, recycling, organics). The route IDs follow a real schema:

  • Digits 1-2 — supervisor / commodity zone (21-25, 11-12 for refuse+recycling; 31-35 for organics; 11/12 mark manual collection).
  • Digits 3-4 — sequence number (also disambiguates commodity when one supervisor handles multiple).
  • Optional suffixO for organics, B for biweekly, blank for weekly.

The module exposes pure functions (cleaners, validators) and one top-level load_sites() that does the full pipeline. Heavy imports (pandas, geopandas) are lazy — declared at the top with from __future__ import annotations, called inside functions — so the package imports fast even when only a few modules are needed.

3 Tests with synthetic data

No real city data is needed to test the cleaners. The test file builds rows in memory and asserts the route IDs come out clean. Run them with:

pytest -q

Tests are the primary documentation of what a function does. When the test is named test_clean_route4_pads_to_four_digits and synthesizes a row with "234" and asserts the result is "0234", anyone reading the code knows the behavior in 10 seconds.

4 pyproject.toml, ruff, pytest, CI

Build configuration that’s lean today and extensible tomorrow. The base dependencies are what every lesson onward expects (pandas, duckdb, geopandas, shapely, pyproj). Optional [project.optional-dependencies] sections (geotab, postgres, dev) get added as lessons need them.

pip install -e ".[dev]"
ruff check .
pytest -q

Add a small GitHub Actions workflow at .github/workflows/ci.yml that runs ruff + pytest on every push. The badge is the difference between a package that is and a package that looks like it is.

5 Commit and push

git init
git add .
git commit -m "Lesson 0: package skeleton + prep/sites"
gh repo create opentrash --public --source=. --push

By the time you push, you have: a real installable package, one working module with tests, ruff clean, CI green, and a folder layout that tells the whole course story without a word of marketing.

· Package anatomy after this lesson

Where everything lives now. new marks files added this lesson. All directories scaffolded with planned-module docstrings.

opentrash/ ├── pyproject.toml # [new] ├── README.md # [new] ├── .github/workflows/ci.yml # [new] ├── docs/architecture.md # [new] ├── opentrash/ │ ├── __init__.py # [new] │ ├── adapters/gps/ # scaffolded — planned: base, geotab, postgres │ ├── core/ # scaffolded — planned: crs, duckdb_session, vehicle_ids │ ├── prep/ │ │ └── sites.py # [new] route-ID parsing + cleaners │ ├── cache/ # scaffolded — planned: gps_cache, gps_indexes, master_index │ ├── tonnage/ # scaffolded — planned: registry, cleaners, keys, upsert, pipeline, lookup │ ├── engine/ # scaffolded — THE ROUTING ENGINE: enrichment + segments │ ├── patterns/ # scaffolded — planned: config, detector, validator │ └── routeview/ # scaffolded — planned: rank, trail, parcel_eval, render, runner └── tests/ └── test_sites.py # [new]

· What you built

  • An installable package with a real pyproject.toml and lazy imports done right.
  • One working module — prep/sites.py — with route-ID anatomy encoded as data, not as comments.
  • Tests with synthetic rows that document the behavior precisely.
  • Ruff + pytest + GitHub Actions CI, green from the first push.
  • A folder layout that maps the whole course — including the routing engine that doesn’t exist yet but already has its place ready.
The package is real. The architecture is on screen. Next lesson dives into how the repo is organized and prepares the environment.

· Companion resources

Optional, for going deeper.

  • PEP 621 / pyproject.toml: official spec — the modern way to declare a Python package.
  • src vs flat layout: when to choose which. We use flat (one less directory level) and it works fine for projects of this size.
  • Lazy imports + from __future__ import annotations: the pattern that lets your package import fast even with heavy deps.

· Next lesson

Lesson 1 — Orientation: a walkthrough of the repo, the environment, and the package layout map. You’ll understand why each directory exists before we start filling them.