One Protocol, two vendors, one cache that never re-fetches.
GPS data is the stream the whole engine sits on. This lesson builds
the three pieces that make the stream consumable: a tiny
GPSAdapter Protocol that defines the canonical
input shape, two concrete adapters (Geotab API and streaming
Postgres) that produce it from real vendor data, and a cache-first
read layer that lands one parquet per vehicle per local day.
· Objective
Build the data ingress for the rest of the package.
adapters/gps/base.py— theGPSAdapterProtocol and the canonical 6-column output schema. Anything that supplies GPS must produce this shape.adapters/gps/geotab.py— Geotab MyGeotab API adapter. Pulls one vehicle’s pings for a local-day window via the LogRecord API.adapters/gps/postgres.py— Streaming Postgres adapter. The production data path: a database where pings stream in every ~10 seconds.cache/gps_cache.py— cache-first reads: ask once, write to disk, subsequent reads hit disk. One parquet per vehicle per local day incache/YYYY-MM-DD/<vehicle>.parquet.
· Build it, step by step
1 The Protocol — adapters/gps/base.py
A Python typing.Protocol defines an interface by shape, not inheritance. Anything with a matching fetch() method satisfies the protocol — no base-class wiring required. That’s the right tool for the “swap vendors freely” design:
class GPSAdapter(Protocol):
def fetch(self, vehicle: str, start_date, end_date) -> pd.DataFrame:
"""Return canonical GPS rows for one vehicle over a local-day range."""
...
The canonical 6-column schema (GPS_COLUMNS) is also defined here:
GPS_COLUMNS = ("vehicle_id", "dt_utc", "dt_local", "lat", "lon", "speed_mph")
Every downstream module imports those names. Adapter quirks — Geotab’s VehicleName, Postgres’s DateTime, mph vs km/h, naive vs aware datetimes — all get normalized to this shape inside each adapter.
2 Geotab adapter — adapters/gps/geotab.py
Geotab is the legacy / on-demand path: a REST API you authenticate against and call to pull one vehicle’s LogRecord rows. The adapter:
- Takes
username,password,databaseas constructor args, with env-var fallback (GEOTAB_USERNAMEetc.). No values committed. - Lazily imports
mygeotab— the package installs cleanly without the[geotab]extra; the adapter only fails if you actually try to use it. - Looks up the device by vehicle name, queries LogRecord for the local-day UTC window, normalizes km/h speeds to mph and naive UTC datetimes to (UTC, Pacific) aware ones.
3 Postgres adapter — adapters/gps/postgres.py
The production path: a Postgres database where a separate streaming process writes pings every ~10 seconds. Reads are cheap and incremental, and you can do range queries efficiently with SQL.
- Takes a SQLAlchemy URL (
postgresql+psycopg2://...) as a constructor arg, with env-var fallback (OPENTRASH_PG_URL). Same no-creds-in-code rule. - Default queries match a Geotab-style schema (
LogRecords2,Devices2); overridedevices_sql/pings_sqlon the constructor if your shape differs. - Server-side chunked fetch (
execution_options(stream_results=True)+fetchmany(chunksize)) so multi-year pulls don’t pin memory. The default 250k-row chunk handles a year of pings comfortably.
Both adapters implement the same fetch(vehicle, start_date, end_date) signature. Downstream code — cache/gps_cache.py below, the engine, RouteView — never knows or cares which vendor is behind the call. Swap one for the other and nothing else changes.
4 Cache-first reads — cache/gps_cache.py
Calling a vendor every time we want pings is wasteful and slow. The cache layer fixes that:
get_gps_day(adapter, "815001", "2026-01-18", cache_dir="cache/")
# First call: adapter.fetch(...) -> writes cache/2026-01-18/815001.parquet
# Second call: reads cache/2026-01-18/815001.parquet directly
- Layout: one file per vehicle per local day. The
YYYY-MM-DD/directory naming sorts chronologically and matches what the Postgres extractor writes natively. - Refresh: pass
refresh=Trueto force a re-fetch (useful when a vendor is back-filling history). - Fleet helper:
get_gps_fleet_day(adapter, vehicles, day)loops through a list, callsget_gps_dayfor each, returns one concatenated DataFrame.
5 Tests with stub adapters
Real GPS APIs need real credentials. For tests, we use a StubAdapter — a tiny class that satisfies the Protocol by returning canned pings and counting how many times it’s called. That lets us prove cache-miss writes, cache-hit reads (no second call), refresh behavior, fleet concatenation, and all the error-path conditions — without network.
pip install -e ".[dev]" # base + dev tools
pip install -e ".[postgres]" # if you'll use the Postgres adapter
ruff check .
pytest -q
6 Commit and push
git add .
git commit -m "Lesson 6: GPS adapters (Geotab + Postgres) + cache-first reads"
git push origin main
· Package anatomy after this lesson
Where everything lives now. new marks files added this lesson.
· What you built
- A Protocol that lets the package treat any GPS vendor identically — the foundational decoupling that keeps engine code vendor-neutral.
- Two concrete adapters — Geotab (REST, on-demand) and Postgres (streaming, server-side chunked) — both producing the canonical 6-column shape.
- A cache-first read layer that turns repeated “give me this vehicle-day” calls into disk reads after the first fetch.
- Credentials via args or env, never in code — the package is safe to fork and commit without leaking secrets.
- Tests using stub adapters — no network, no real credentials, full behavior coverage.
· Companion resources
Optional, for going deeper.
typing.Protocol: official docs — structural typing in Python.- SQLAlchemy streaming results: stream_results docs — how to fetch huge result sets without holding them in memory.
- The 12-factor app: factor III, “Config” — the canonical argument for environment-variable configuration.
· Next lesson
Lesson 7 — The substrate: parcels with WKB geometry, per-day vehicle-day indexes, and the master GPS index that joins them. The lookup tables that make the routing engine fast.