Includes verified API authentication flow, correct base URL, all DB table schemas for stations, current prices, history, and archive. Fuel types corrected to match live API (B7_STANDARD, B7_PREMIUM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
163 lines
6.6 KiB
Markdown
163 lines
6.6 KiB
Markdown
# Fuel API Ingestion & Historic Storage Design
|
||
|
||
**Date:** 2026-04-03
|
||
**Scope:** UK Fuel Finder API integration, database schema for station metadata and historic price storage.
|
||
|
||
---
|
||
|
||
## Context
|
||
|
||
The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons.
|
||
|
||
---
|
||
|
||
## API
|
||
|
||
**Base URL:** `https://www.fuel-finder.service.gov.uk/api/v1`
|
||
|
||
### Authentication
|
||
|
||
OAuth 2.0 via JSON POST (not form-encoded).
|
||
|
||
- **Get token:** `POST /oauth/generate_access_token` `{"client_id": "...", "client_secret": "..."}`
|
||
- **Refresh token:** `POST /oauth/regenerate_access_token` same payload
|
||
- Response includes `access_token` (Bearer), `expires_in: 3600`, `refresh_token`
|
||
- Cache token at key `fuel_finder_access_token` with TTL = `expires_in - 60` (3540s)
|
||
- On cache miss: fetch new token, store, return
|
||
|
||
### Endpoints
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/pfs?batch-number={n}` | Station metadata, 500 per batch |
|
||
| GET | `/pfs/fuel-prices?batch-number={n}` | All station prices, 500 per batch |
|
||
| GET | `/pfs/fuel-prices` | Incremental prices (recently changed only) |
|
||
|
||
- `node_id` is the station identifier — consistent across both endpoints (verified against live API)
|
||
- Both endpoints return a flat JSON array (no pagination wrapper)
|
||
- Total stations: ~14,500 across ~30 batches
|
||
- Fuel types in production: `E10`, `E5`, `B7_STANDARD`, `B7_PREMIUM`, `HVO`, `B10`
|
||
|
||
### Polling strategy
|
||
|
||
- **Every 15 minutes:** call `/pfs/fuel-prices` (no batch-number) — returns only recently changed prices
|
||
- **Once daily (3am):** full refresh — iterate all batches of both `/pfs` and `/pfs/fuel-prices` to catch any drift
|
||
|
||
---
|
||
|
||
## Database Schema
|
||
|
||
### `stations`
|
||
|
||
One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new `node_id`.
|
||
|
||
```
|
||
node_id VARCHAR(64) PRIMARY KEY
|
||
trading_name VARCHAR(128)
|
||
brand_name VARCHAR(64) NULLABLE
|
||
is_same_trading_and_brand TINYINT(1)
|
||
is_supermarket TINYINT(1) DEFAULT 0 — set by StationTaggingService
|
||
is_motorway_service_station TINYINT(1) DEFAULT 0
|
||
is_supermarket_service_station TINYINT(1) DEFAULT 0
|
||
temporary_closure TINYINT(1) DEFAULT 0
|
||
permanent_closure TINYINT(1) DEFAULT 0
|
||
permanent_closure_date DATE NULLABLE
|
||
public_phone_number VARCHAR(20) NULLABLE
|
||
address_line_1 VARCHAR(255) NULLABLE
|
||
address_line_2 VARCHAR(255) NULLABLE
|
||
city VARCHAR(100) NULLABLE
|
||
county VARCHAR(100) NULLABLE
|
||
country VARCHAR(64) NULLABLE
|
||
postcode VARCHAR(10)
|
||
lat DECIMAL(10,7)
|
||
lng DECIMAL(10,7)
|
||
amenities JSON NULLABLE
|
||
opening_times JSON NULLABLE
|
||
fuel_types JSON NULLABLE — array of supported fuel type strings
|
||
last_seen_at DATETIME
|
||
```
|
||
|
||
### `station_prices_current`
|
||
|
||
One row per `(station_id, fuel_type)`. Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table.
|
||
|
||
```
|
||
station_id VARCHAR(64) FK → stations.node_id
|
||
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
|
||
price_pence SMALLINT UNSIGNED — price × 100 (e.g. 15990 = 159.90p, never float)
|
||
price_effective_at DATETIME — price_change_effective_timestamp from API
|
||
price_reported_at DATETIME — price_last_updated from API
|
||
recorded_at DATETIME — when this row was last upserted
|
||
|
||
PRIMARY KEY (station_id, fuel_type)
|
||
```
|
||
|
||
### `station_prices`
|
||
|
||
Append-only price history. One row per price change per station+fuel. Partitioned monthly on `price_effective_at`. Covers the last 12 months (hot table).
|
||
|
||
```
|
||
id BIGINT UNSIGNED AUTO_INCREMENT
|
||
station_id VARCHAR(64) FK → stations.node_id
|
||
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
|
||
price_pence SMALLINT UNSIGNED
|
||
price_effective_at DATETIME
|
||
price_reported_at DATETIME
|
||
recorded_at DATETIME
|
||
|
||
PRIMARY KEY (id, price_effective_at)
|
||
INDEX (station_id, fuel_type, price_effective_at)
|
||
INDEX (price_effective_at)
|
||
PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at))
|
||
```
|
||
|
||
**Deduplication:** only insert a new row if `price_pence` differs from the most recent stored value for that `(station_id, fuel_type)`. This prevents duplicates on full refreshes when prices haven't changed.
|
||
|
||
### `station_prices_archive`
|
||
|
||
Identical schema to `station_prices` but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine.
|
||
|
||
```
|
||
(same columns as station_prices — no partition)
|
||
INDEX (station_id, fuel_type, price_effective_at)
|
||
INDEX (price_effective_at)
|
||
```
|
||
|
||
---
|
||
|
||
## Relationships
|
||
|
||
```
|
||
stations.node_id ←── station_prices_current.station_id
|
||
←── station_prices.station_id
|
||
←── station_prices_archive.station_id
|
||
```
|
||
|
||
---
|
||
|
||
## Service responsibilities
|
||
|
||
**`FuelPriceService`**
|
||
1. Fetch/cache OAuth token
|
||
2. Incremental poll every 15 min: GET `/pfs/fuel-prices`, upsert `station_prices_current`, insert into `station_prices` where price changed
|
||
3. Full refresh daily: iterate all batches of `/pfs` (upsert `stations`) and `/pfs/fuel-prices` (same price logic)
|
||
4. Call `StationTaggingService` to set `is_supermarket` and normalise `brand_name`
|
||
5. Dispatch `PricesUpdatedEvent` after each poll
|
||
|
||
**`StationTaggingService`**
|
||
- Matches `trading_name` against known supermarket brands (case-insensitive)
|
||
- Sets `is_supermarket = 1` and normalises `brand_name`
|
||
|
||
**Scheduled archive command**
|
||
- Runs monthly
|
||
- Moves rows from `station_prices` where `price_effective_at < NOW() - 12 months` into `station_prices_archive`
|
||
- Drops the corresponding old partition from `station_prices`
|
||
|
||
---
|
||
|
||
## Open questions / adjustable later
|
||
|
||
- Exact partition pre-creation strategy (how many months ahead to create partitions)
|
||
- Whether `station_prices_archive` needs its own partitioning if it grows very large
|
||
- Additional fuel types if the API introduces new ones (extend ENUM in migration)
|