Add fuel API ingestion and historic storage design spec
Includes verified API authentication flow, correct base URL, all DB table schemas for stations, current prices, history, and archive. Fuel types corrected to match live API (B7_STANDARD, B7_PREMIUM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
162
docs/superpowers/specs/2026-04-03-fuel-api-ingestion-design.md
Normal file
162
docs/superpowers/specs/2026-04-03-fuel-api-ingestion-design.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# Fuel API Ingestion & Historic Storage Design
|
||||
|
||||
**Date:** 2026-04-03
|
||||
**Scope:** UK Fuel Finder API integration, database schema for station metadata and historic price storage.
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons.
|
||||
|
||||
---
|
||||
|
||||
## API
|
||||
|
||||
**Base URL:** `https://www.fuel-finder.service.gov.uk/api/v1`
|
||||
|
||||
### Authentication
|
||||
|
||||
OAuth 2.0 via JSON POST (not form-encoded).
|
||||
|
||||
- **Get token:** `POST /oauth/generate_access_token` `{"client_id": "...", "client_secret": "..."}`
|
||||
- **Refresh token:** `POST /oauth/regenerate_access_token` same payload
|
||||
- Response includes `access_token` (Bearer), `expires_in: 3600`, `refresh_token`
|
||||
- Cache token at key `fuel_finder_access_token` with TTL = `expires_in - 60` (3540s)
|
||||
- On cache miss: fetch new token, store, return
|
||||
|
||||
### Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/pfs?batch-number={n}` | Station metadata, 500 per batch |
|
||||
| GET | `/pfs/fuel-prices?batch-number={n}` | All station prices, 500 per batch |
|
||||
| GET | `/pfs/fuel-prices` | Incremental prices (recently changed only) |
|
||||
|
||||
- `node_id` is the station identifier — consistent across both endpoints (verified against live API)
|
||||
- Both endpoints return a flat JSON array (no pagination wrapper)
|
||||
- Total stations: ~14,500 across ~30 batches
|
||||
- Fuel types in production: `E10`, `E5`, `B7_STANDARD`, `B7_PREMIUM`, `HVO`, `B10`
|
||||
|
||||
### Polling strategy
|
||||
|
||||
- **Every 15 minutes:** call `/pfs/fuel-prices` (no batch-number) — returns only recently changed prices
|
||||
- **Once daily (3am):** full refresh — iterate all batches of both `/pfs` and `/pfs/fuel-prices` to catch any drift
|
||||
|
||||
---
|
||||
|
||||
## Database Schema
|
||||
|
||||
### `stations`
|
||||
|
||||
One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new `node_id`.
|
||||
|
||||
```
|
||||
node_id VARCHAR(64) PRIMARY KEY
|
||||
trading_name VARCHAR(128)
|
||||
brand_name VARCHAR(64) NULLABLE
|
||||
is_same_trading_and_brand TINYINT(1)
|
||||
is_supermarket TINYINT(1) DEFAULT 0 — set by StationTaggingService
|
||||
is_motorway_service_station TINYINT(1) DEFAULT 0
|
||||
is_supermarket_service_station TINYINT(1) DEFAULT 0
|
||||
temporary_closure TINYINT(1) DEFAULT 0
|
||||
permanent_closure TINYINT(1) DEFAULT 0
|
||||
permanent_closure_date DATE NULLABLE
|
||||
public_phone_number VARCHAR(20) NULLABLE
|
||||
address_line_1 VARCHAR(255) NULLABLE
|
||||
address_line_2 VARCHAR(255) NULLABLE
|
||||
city VARCHAR(100) NULLABLE
|
||||
county VARCHAR(100) NULLABLE
|
||||
country VARCHAR(64) NULLABLE
|
||||
postcode VARCHAR(10)
|
||||
lat DECIMAL(10,7)
|
||||
lng DECIMAL(10,7)
|
||||
amenities JSON NULLABLE
|
||||
opening_times JSON NULLABLE
|
||||
fuel_types JSON NULLABLE — array of supported fuel type strings
|
||||
last_seen_at DATETIME
|
||||
```
|
||||
|
||||
### `station_prices_current`
|
||||
|
||||
One row per `(station_id, fuel_type)`. Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table.
|
||||
|
||||
```
|
||||
station_id VARCHAR(64) FK → stations.node_id
|
||||
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
|
||||
price_pence SMALLINT UNSIGNED — price × 100 (e.g. 15990 = 159.90p, never float)
|
||||
price_effective_at DATETIME — price_change_effective_timestamp from API
|
||||
price_reported_at DATETIME — price_last_updated from API
|
||||
recorded_at DATETIME — when this row was last upserted
|
||||
|
||||
PRIMARY KEY (station_id, fuel_type)
|
||||
```
|
||||
|
||||
### `station_prices`
|
||||
|
||||
Append-only price history. One row per price change per station+fuel. Partitioned monthly on `price_effective_at`. Covers the last 12 months (hot table).
|
||||
|
||||
```
|
||||
id BIGINT UNSIGNED AUTO_INCREMENT
|
||||
station_id VARCHAR(64) FK → stations.node_id
|
||||
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
|
||||
price_pence SMALLINT UNSIGNED
|
||||
price_effective_at DATETIME
|
||||
price_reported_at DATETIME
|
||||
recorded_at DATETIME
|
||||
|
||||
PRIMARY KEY (id, price_effective_at)
|
||||
INDEX (station_id, fuel_type, price_effective_at)
|
||||
INDEX (price_effective_at)
|
||||
PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at))
|
||||
```
|
||||
|
||||
**Deduplication:** only insert a new row if `price_pence` differs from the most recent stored value for that `(station_id, fuel_type)`. This prevents duplicates on full refreshes when prices haven't changed.
|
||||
|
||||
### `station_prices_archive`
|
||||
|
||||
Identical schema to `station_prices` but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine.
|
||||
|
||||
```
|
||||
(same columns as station_prices — no partition)
|
||||
INDEX (station_id, fuel_type, price_effective_at)
|
||||
INDEX (price_effective_at)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Relationships
|
||||
|
||||
```
|
||||
stations.node_id ←── station_prices_current.station_id
|
||||
←── station_prices.station_id
|
||||
←── station_prices_archive.station_id
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service responsibilities
|
||||
|
||||
**`FuelPriceService`**
|
||||
1. Fetch/cache OAuth token
|
||||
2. Incremental poll every 15 min: GET `/pfs/fuel-prices`, upsert `station_prices_current`, insert into `station_prices` where price changed
|
||||
3. Full refresh daily: iterate all batches of `/pfs` (upsert `stations`) and `/pfs/fuel-prices` (same price logic)
|
||||
4. Call `StationTaggingService` to set `is_supermarket` and normalise `brand_name`
|
||||
5. Dispatch `PricesUpdatedEvent` after each poll
|
||||
|
||||
**`StationTaggingService`**
|
||||
- Matches `trading_name` against known supermarket brands (case-insensitive)
|
||||
- Sets `is_supermarket = 1` and normalises `brand_name`
|
||||
|
||||
**Scheduled archive command**
|
||||
- Runs monthly
|
||||
- Moves rows from `station_prices` where `price_effective_at < NOW() - 12 months` into `station_prices_archive`
|
||||
- Drops the corresponding old partition from `station_prices`
|
||||
|
||||
---
|
||||
|
||||
## Open questions / adjustable later
|
||||
|
||||
- Exact partition pre-creation strategy (how many months ahead to create partitions)
|
||||
- Whether `station_prices_archive` needs its own partitioning if it grows very large
|
||||
- Additional fuel types if the API introduces new ones (extend ENUM in migration)
|
||||
Reference in New Issue
Block a user