Includes verified API authentication flow, correct base URL, all DB table schemas for stations, current prices, history, and archive. Fuel types corrected to match live API (B7_STANDARD, B7_PREMIUM). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.6 KiB
Fuel API Ingestion & Historic Storage Design
Date: 2026-04-03 Scope: UK Fuel Finder API integration, database schema for station metadata and historic price storage.
Context
The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons.
API
Base URL: https://www.fuel-finder.service.gov.uk/api/v1
Authentication
OAuth 2.0 via JSON POST (not form-encoded).
- Get token:
POST /oauth/generate_access_token{"client_id": "...", "client_secret": "..."} - Refresh token:
POST /oauth/regenerate_access_tokensame payload - Response includes
access_token(Bearer),expires_in: 3600,refresh_token - Cache token at key
fuel_finder_access_tokenwith TTL =expires_in - 60(3540s) - On cache miss: fetch new token, store, return
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /pfs?batch-number={n} |
Station metadata, 500 per batch |
| GET | /pfs/fuel-prices?batch-number={n} |
All station prices, 500 per batch |
| GET | /pfs/fuel-prices |
Incremental prices (recently changed only) |
node_idis the station identifier — consistent across both endpoints (verified against live API)- Both endpoints return a flat JSON array (no pagination wrapper)
- Total stations: ~14,500 across ~30 batches
- Fuel types in production:
E10,E5,B7_STANDARD,B7_PREMIUM,HVO,B10
Polling strategy
- Every 15 minutes: call
/pfs/fuel-prices(no batch-number) — returns only recently changed prices - Once daily (3am): full refresh — iterate all batches of both
/pfsand/pfs/fuel-pricesto catch any drift
Database Schema
stations
One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new node_id.
node_id VARCHAR(64) PRIMARY KEY
trading_name VARCHAR(128)
brand_name VARCHAR(64) NULLABLE
is_same_trading_and_brand TINYINT(1)
is_supermarket TINYINT(1) DEFAULT 0 — set by StationTaggingService
is_motorway_service_station TINYINT(1) DEFAULT 0
is_supermarket_service_station TINYINT(1) DEFAULT 0
temporary_closure TINYINT(1) DEFAULT 0
permanent_closure TINYINT(1) DEFAULT 0
permanent_closure_date DATE NULLABLE
public_phone_number VARCHAR(20) NULLABLE
address_line_1 VARCHAR(255) NULLABLE
address_line_2 VARCHAR(255) NULLABLE
city VARCHAR(100) NULLABLE
county VARCHAR(100) NULLABLE
country VARCHAR(64) NULLABLE
postcode VARCHAR(10)
lat DECIMAL(10,7)
lng DECIMAL(10,7)
amenities JSON NULLABLE
opening_times JSON NULLABLE
fuel_types JSON NULLABLE — array of supported fuel type strings
last_seen_at DATETIME
station_prices_current
One row per (station_id, fuel_type). Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table.
station_id VARCHAR(64) FK → stations.node_id
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence SMALLINT UNSIGNED — price × 100 (e.g. 15990 = 159.90p, never float)
price_effective_at DATETIME — price_change_effective_timestamp from API
price_reported_at DATETIME — price_last_updated from API
recorded_at DATETIME — when this row was last upserted
PRIMARY KEY (station_id, fuel_type)
station_prices
Append-only price history. One row per price change per station+fuel. Partitioned monthly on price_effective_at. Covers the last 12 months (hot table).
id BIGINT UNSIGNED AUTO_INCREMENT
station_id VARCHAR(64) FK → stations.node_id
fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence SMALLINT UNSIGNED
price_effective_at DATETIME
price_reported_at DATETIME
recorded_at DATETIME
PRIMARY KEY (id, price_effective_at)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)
PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at))
Deduplication: only insert a new row if price_pence differs from the most recent stored value for that (station_id, fuel_type). This prevents duplicates on full refreshes when prices haven't changed.
station_prices_archive
Identical schema to station_prices but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine.
(same columns as station_prices — no partition)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)
Relationships
stations.node_id ←── station_prices_current.station_id
←── station_prices.station_id
←── station_prices_archive.station_id
Service responsibilities
FuelPriceService
- Fetch/cache OAuth token
- Incremental poll every 15 min: GET
/pfs/fuel-prices, upsertstation_prices_current, insert intostation_priceswhere price changed - Full refresh daily: iterate all batches of
/pfs(upsertstations) and/pfs/fuel-prices(same price logic) - Call
StationTaggingServiceto setis_supermarketand normalisebrand_name - Dispatch
PricesUpdatedEventafter each poll
StationTaggingService
- Matches
trading_nameagainst known supermarket brands (case-insensitive) - Sets
is_supermarket = 1and normalisesbrand_name
Scheduled archive command
- Runs monthly
- Moves rows from
station_priceswhereprice_effective_at < NOW() - 12 monthsintostation_prices_archive - Drops the corresponding old partition from
station_prices
Open questions / adjustable later
- Exact partition pre-creation strategy (how many months ahead to create partitions)
- Whether
station_prices_archiveneeds its own partitioning if it grows very large - Additional fuel types if the API introduces new ones (extend ENUM in migration)