# Fuel API Ingestion & Historic Storage Design **Date:** 2026-04-03 **Scope:** UK Fuel Finder API integration, database schema for station metadata and historic price storage. --- ## Context The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons. --- ## API **Base URL:** `https://www.fuel-finder.service.gov.uk/api/v1` ### Authentication OAuth 2.0 via JSON POST (not form-encoded). - **Get token:** `POST /oauth/generate_access_token` `{"client_id": "...", "client_secret": "..."}` - **Refresh token:** `POST /oauth/regenerate_access_token` same payload - Response includes `access_token` (Bearer), `expires_in: 3600`, `refresh_token` - Cache token at key `fuel_finder_access_token` with TTL = `expires_in - 60` (3540s) - On cache miss: fetch new token, store, return ### Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/pfs?batch-number={n}` | Station metadata, 500 per batch | | GET | `/pfs/fuel-prices?batch-number={n}` | All station prices, 500 per batch | | GET | `/pfs/fuel-prices` | Incremental prices (recently changed only) | - `node_id` is the station identifier — consistent across both endpoints (verified against live API) - Both endpoints return a flat JSON array (no pagination wrapper) - Total stations: ~14,500 across ~30 batches - Fuel types in production: `E10`, `E5`, `B7_STANDARD`, `B7_PREMIUM`, `HVO`, `B10` ### Polling strategy - **Every 15 minutes:** call `/pfs/fuel-prices` (no batch-number) — returns only recently changed prices - **Once daily (3am):** full refresh — iterate all batches of both `/pfs` and `/pfs/fuel-prices` to catch any drift --- ## Database Schema ### `stations` One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new `node_id`. ``` node_id VARCHAR(64) PRIMARY KEY trading_name VARCHAR(128) brand_name VARCHAR(64) NULLABLE is_same_trading_and_brand TINYINT(1) is_supermarket TINYINT(1) DEFAULT 0 — set by StationTaggingService is_motorway_service_station TINYINT(1) DEFAULT 0 is_supermarket_service_station TINYINT(1) DEFAULT 0 temporary_closure TINYINT(1) DEFAULT 0 permanent_closure TINYINT(1) DEFAULT 0 permanent_closure_date DATE NULLABLE public_phone_number VARCHAR(20) NULLABLE address_line_1 VARCHAR(255) NULLABLE address_line_2 VARCHAR(255) NULLABLE city VARCHAR(100) NULLABLE county VARCHAR(100) NULLABLE country VARCHAR(64) NULLABLE postcode VARCHAR(10) lat DECIMAL(10,7) lng DECIMAL(10,7) amenities JSON NULLABLE opening_times JSON NULLABLE fuel_types JSON NULLABLE — array of supported fuel type strings last_seen_at DATETIME ``` ### `station_prices_current` One row per `(station_id, fuel_type)`. Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table. ``` station_id VARCHAR(64) FK → stations.node_id fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo') price_pence SMALLINT UNSIGNED — price × 100 (e.g. 15990 = 159.90p, never float) price_effective_at DATETIME — price_change_effective_timestamp from API price_reported_at DATETIME — price_last_updated from API recorded_at DATETIME — when this row was last upserted PRIMARY KEY (station_id, fuel_type) ``` ### `station_prices` Append-only price history. One row per price change per station+fuel. Partitioned monthly on `price_effective_at`. Covers the last 12 months (hot table). ``` id BIGINT UNSIGNED AUTO_INCREMENT station_id VARCHAR(64) FK → stations.node_id fuel_type ENUM('e10','e5','b7_standard','b7_premium','b10','hvo') price_pence SMALLINT UNSIGNED price_effective_at DATETIME price_reported_at DATETIME recorded_at DATETIME PRIMARY KEY (id, price_effective_at) INDEX (station_id, fuel_type, price_effective_at) INDEX (price_effective_at) PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at)) ``` **Deduplication:** only insert a new row if `price_pence` differs from the most recent stored value for that `(station_id, fuel_type)`. This prevents duplicates on full refreshes when prices haven't changed. ### `station_prices_archive` Identical schema to `station_prices` but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine. ``` (same columns as station_prices — no partition) INDEX (station_id, fuel_type, price_effective_at) INDEX (price_effective_at) ``` --- ## Relationships ``` stations.node_id ←── station_prices_current.station_id ←── station_prices.station_id ←── station_prices_archive.station_id ``` --- ## Service responsibilities **`FuelPriceService`** 1. Fetch/cache OAuth token 2. Incremental poll every 15 min: GET `/pfs/fuel-prices`, upsert `station_prices_current`, insert into `station_prices` where price changed 3. Full refresh daily: iterate all batches of `/pfs` (upsert `stations`) and `/pfs/fuel-prices` (same price logic) 4. Call `StationTaggingService` to set `is_supermarket` and normalise `brand_name` 5. Dispatch `PricesUpdatedEvent` after each poll **`StationTaggingService`** - Matches `trading_name` against known supermarket brands (case-insensitive) - Sets `is_supermarket = 1` and normalises `brand_name` **Scheduled archive command** - Runs monthly - Moves rows from `station_prices` where `price_effective_at < NOW() - 12 months` into `station_prices_archive` - Drops the corresponding old partition from `station_prices` --- ## Open questions / adjustable later - Exact partition pre-creation strategy (how many months ahead to create partitions) - Whether `station_prices_archive` needs its own partitioning if it grows very large - Additional fuel types if the API introduces new ones (extend ENUM in migration)