One notebook function, a real package, and the whole map.
Most "build a Python package" tutorials stop at setup.py.
We're going to go a step further today: build a package whose
shape reflects what it will eventually do. Eight subdirectories,
one real module, every __init__.py labeled with what's
coming. By the end of this lesson, anyone who opens the repo can see
the entire course at a glance.
· Objective
Stand up the opentrash package skeleton, land one real working module
(prep/sites.py) translated from a notebook, and put it
under CI — all in one sitting.
- Scaffold the package with the destination architecture. Every subdirectory has a docstring listing what's planned. The course is the map.
- Translate the notebook’s sites-cleaning logic into
prep/sites.pywith proper typed functions, docstrings, and tests. - Set up
pyproject.toml, ruff, pytest, and a GitHub Actions CI workflow so the package is real from commit one.
engine/) is already there, with a docstring
telling you what’s coming. Lesson 0’s job is to land the shape.
· Before you begin
- Python 3.11 or 3.12.
- A GitHub account, with
giton your machine. - You don’t need any data files yet — the tests use synthetic rows.
· Build it, step by step
1 Scaffold the package
Create the directory layout and an __init__.py in each subpackage. Every subdirectory gets a docstring that lists what’s coming — this is the part most tutorials skip and it’s the part that makes the package feel coherent from day one.
opentrash/
├── adapters/gps/ vendor-specific GPS sources (Geotab, Postgres)
├── core/ CRS, DuckDB session, vehicle IDs
├── prep/ parcels, sites, static layers (routes/facilities)
├── cache/ GPS cache + spatial-temporal indexes
├── tonnage/ Excel → parquet ingestion + lookup
├── engine/ THE ROUTING ENGINE
├── patterns/ pattern detection (consumes engine output)
└── routeview/ single-route/single-day map (consumes engine output)
Look at opentrash/engine/__init__.py right now. It’s a docstring that says “here’s what the routing engine will do.” The audience for your code — future you, contributors, learners — will read that and instantly understand the architecture. That’s free engineering documentation, and it costs about 30 seconds to write.
2 Land the first real module — prep/sites.py
A site is a serviced address. The city’s export carries ~29 columns of mixed quality; this module turns it into a clean validated table with canonical route IDs for each commodity (refuse, recycling, organics). The route IDs follow a real schema:
- Digits 1-2 — supervisor / commodity zone (21-25, 11-12 for refuse+recycling; 31-35 for organics; 11/12 mark manual collection).
- Digits 3-4 — sequence number (also disambiguates commodity when one supervisor handles multiple).
- Optional suffix —
Ofor organics,Bfor biweekly, blank for weekly.
The module exposes pure functions (cleaners, validators) and one top-level load_sites() that does the full pipeline. Heavy imports (pandas, geopandas) are lazy — declared at the top with from __future__ import annotations, called inside functions — so the package imports fast even when only a few modules are needed.
3 Tests with synthetic data
No real city data is needed to test the cleaners. The test file builds rows in memory and asserts the route IDs come out clean. Run them with:
pytest -q
Tests are the primary documentation of what a function does. When the test is named test_clean_route4_pads_to_four_digits and synthesizes a row with "234" and asserts the result is "0234", anyone reading the code knows the behavior in 10 seconds.
4 pyproject.toml, ruff, pytest, CI
Build configuration that’s lean today and extensible tomorrow. The base dependencies are what every lesson onward expects (pandas, duckdb, geopandas, shapely, pyproj). Optional [project.optional-dependencies] sections (geotab, postgres, dev) get added as lessons need them.
pip install -e ".[dev]"
ruff check .
pytest -q
Add a small GitHub Actions workflow at .github/workflows/ci.yml that runs ruff + pytest on every push. The badge is the difference between a package that is and a package that looks like it is.
5 Commit and push
git init
git add .
git commit -m "Lesson 0: package skeleton + prep/sites"
gh repo create opentrash --public --source=. --push
By the time you push, you have: a real installable package, one working module with tests, ruff clean, CI green, and a folder layout that tells the whole course story without a word of marketing.
· Package anatomy after this lesson
Where everything lives now. new marks files added this lesson. All directories scaffolded with planned-module docstrings.
· What you built
- An installable package with a real
pyproject.tomland lazy imports done right. - One working module —
prep/sites.py— with route-ID anatomy encoded as data, not as comments. - Tests with synthetic rows that document the behavior precisely.
- Ruff + pytest + GitHub Actions CI, green from the first push.
- A folder layout that maps the whole course — including the routing engine that doesn’t exist yet but already has its place ready.
· Companion resources
Optional, for going deeper.
- PEP 621 / pyproject.toml: official spec — the modern way to declare a Python package.
- src vs flat layout: when to choose which. We use flat (one less directory level) and it works fine for projects of this size.
- Lazy imports +
from __future__ import annotations: the pattern that lets your package import fast even with heavy deps.
· Next lesson
Lesson 1 — Orientation: a walkthrough of the repo, the environment, and the package layout map. You’ll understand why each directory exists before we start filling them.