Files
fuel-price/docs/superpowers/specs/2026-04-03-fuel-api-ingestion-design.md
Ovidiu U 5ad89e977d Add fuel API ingestion and historic storage design spec
Includes verified API authentication flow, correct base URL, all DB
table schemas for stations, current prices, history, and archive.
Fuel types corrected to match live API (B7_STANDARD, B7_PREMIUM).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:09:50 +01:00

163 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Fuel API Ingestion & Historic Storage Design
**Date:** 2026-04-03
**Scope:** UK Fuel Finder API integration, database schema for station metadata and historic price storage.
---
## Context
The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons.
---
## API
**Base URL:** `https://www.fuel-finder.service.gov.uk/api/v1`
### Authentication
OAuth 2.0 via JSON POST (not form-encoded).
- **Get token:** `POST /oauth/generate_access_token` `{"client_id": "...", "client_secret": "..."}`
- **Refresh token:** `POST /oauth/regenerate_access_token` same payload
- Response includes `access_token` (Bearer), `expires_in: 3600`, `refresh_token`
- Cache token at key `fuel_finder_access_token` with TTL = `expires_in - 60` (3540s)
- On cache miss: fetch new token, store, return
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/pfs?batch-number={n}` | Station metadata, 500 per batch |
| GET | `/pfs/fuel-prices?batch-number={n}` | All station prices, 500 per batch |
| GET | `/pfs/fuel-prices` | Incremental prices (recently changed only) |
- `node_id` is the station identifier — consistent across both endpoints (verified against live API)
- Both endpoints return a flat JSON array (no pagination wrapper)
- Total stations: ~14,500 across ~30 batches
- Fuel types in production: `E10`, `E5`, `B7_STANDARD`, `B7_PREMIUM`, `HVO`, `B10`
### Polling strategy
- **Every 15 minutes:** call `/pfs/fuel-prices` (no batch-number) — returns only recently changed prices
- **Once daily (3am):** full refresh — iterate all batches of both `/pfs` and `/pfs/fuel-prices` to catch any drift
---
## Database Schema
### `stations`
One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new `node_id`.
```
node_id VARCHAR(64) PRIMARY KEY
trading_name VARCHAR(128)
brand_name VARCHAR(64) NULLABLE
is_same_trading_and_brand TINYINT(1)
is_supermarket TINYINT(1) DEFAULT 0 — set by StationTaggingService
is_motorway_service_station TINYINT(1) DEFAULT 0
is_supermarket_service_station TINYINT(1) DEFAULT 0
temporary_closure TINYINT(1) DEFAULT 0
permanent_closure TINYINT(1) DEFAULT 0
permanent_closure_date DATE NULLABLE
public_phone_number VARCHAR(20) NULLABLE
address_line_1 VARCHAR(255) NULLABLE
address_line_2 VARCHAR(255) NULLABLE
city VARCHAR(100) NULLABLE
county VARCHAR(100) NULLABLE
country VARCHAR(64) NULLABLE
postcode VARCHAR(10)
lat DECIMAL(10,7)
lng DECIMAL(10,7)
amenities JSON NULLABLE
opening_times JSON NULLABLE
fuel_types JSON NULLABLE — array of supported fuel type strings
last_seen_at DATETIME
```
### `station_prices_current`
One row per `(station_id, fuel_type)`. Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table.
```
station_id VARCHAR(64) FK → stations.node_id
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence SMALLINT UNSIGNED — price × 100 (e.g. 15990 = 159.90p, never float)
price_effective_at DATETIME — price_change_effective_timestamp from API
price_reported_at DATETIME — price_last_updated from API
recorded_at DATETIME — when this row was last upserted
PRIMARY KEY (station_id, fuel_type)
```
### `station_prices`
Append-only price history. One row per price change per station+fuel. Partitioned monthly on `price_effective_at`. Covers the last 12 months (hot table).
```
id BIGINT UNSIGNED AUTO_INCREMENT
station_id VARCHAR(64) FK → stations.node_id
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence SMALLINT UNSIGNED
price_effective_at DATETIME
price_reported_at DATETIME
recorded_at DATETIME
PRIMARY KEY (id, price_effective_at)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)
PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at))
```
**Deduplication:** only insert a new row if `price_pence` differs from the most recent stored value for that `(station_id, fuel_type)`. This prevents duplicates on full refreshes when prices haven't changed.
### `station_prices_archive`
Identical schema to `station_prices` but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine.
```
(same columns as station_prices — no partition)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)
```
---
## Relationships
```
stations.node_id ←── station_prices_current.station_id
←── station_prices.station_id
←── station_prices_archive.station_id
```
---
## Service responsibilities
**`FuelPriceService`**
1. Fetch/cache OAuth token
2. Incremental poll every 15 min: GET `/pfs/fuel-prices`, upsert `station_prices_current`, insert into `station_prices` where price changed
3. Full refresh daily: iterate all batches of `/pfs` (upsert `stations`) and `/pfs/fuel-prices` (same price logic)
4. Call `StationTaggingService` to set `is_supermarket` and normalise `brand_name`
5. Dispatch `PricesUpdatedEvent` after each poll
**`StationTaggingService`**
- Matches `trading_name` against known supermarket brands (case-insensitive)
- Sets `is_supermarket = 1` and normalises `brand_name`
**Scheduled archive command**
- Runs monthly
- Moves rows from `station_prices` where `price_effective_at < NOW() - 12 months` into `station_prices_archive`
- Drops the corresponding old partition from `station_prices`
---
## Open questions / adjustable later
- Exact partition pre-creation strategy (how many months ahead to create partitions)
- Whether `station_prices_archive` needs its own partitioning if it grows very large
- Additional fuel types if the API introduces new ones (extend ENUM in migration)