Files
fuel-price/docs/superpowers/specs/2026-04-03-fuel-api-ingestion-design.md
Ovidiu U 5ad89e977d Add fuel API ingestion and historic storage design spec
Includes verified API authentication flow, correct base URL, all DB
table schemas for stations, current prices, history, and archive.
Fuel types corrected to match live API (B7_STANDARD, B7_PREMIUM).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 18:09:50 +01:00

6.6 KiB
Raw Blame History

Fuel API Ingestion & Historic Storage Design

Date: 2026-04-03 Scope: UK Fuel Finder API integration, database schema for station metadata and historic price storage.


Context

The app polls the UK gov.uk Fuel Finder API to collect petrol station prices across the UK (~14,500 stations). Prices are used by the scoring engine to produce fill-up recommendations for users. Historic data is retained indefinitely — a hot table covers the last year for scoring queries, an archive table holds everything older for graphs and comparisons.


API

Base URL: https://www.fuel-finder.service.gov.uk/api/v1

Authentication

OAuth 2.0 via JSON POST (not form-encoded).

  • Get token: POST /oauth/generate_access_token {"client_id": "...", "client_secret": "..."}
  • Refresh token: POST /oauth/regenerate_access_token same payload
  • Response includes access_token (Bearer), expires_in: 3600, refresh_token
  • Cache token at key fuel_finder_access_token with TTL = expires_in - 60 (3540s)
  • On cache miss: fetch new token, store, return

Endpoints

Method Path Description
GET /pfs?batch-number={n} Station metadata, 500 per batch
GET /pfs/fuel-prices?batch-number={n} All station prices, 500 per batch
GET /pfs/fuel-prices Incremental prices (recently changed only)
  • node_id is the station identifier — consistent across both endpoints (verified against live API)
  • Both endpoints return a flat JSON array (no pagination wrapper)
  • Total stations: ~14,500 across ~30 batches
  • Fuel types in production: E10, E5, B7_STANDARD, B7_PREMIUM, HVO, B10

Polling strategy

  • Every 15 minutes: call /pfs/fuel-prices (no batch-number) — returns only recently changed prices
  • Once daily (3am): full refresh — iterate all batches of both /pfs and /pfs/fuel-prices to catch any drift

Database Schema

stations

One row per petrol filling station. Upserted on full daily refresh and when an incremental poll encounters a new node_id.

node_id                        VARCHAR(64)   PRIMARY KEY
trading_name                   VARCHAR(128)
brand_name                     VARCHAR(64)   NULLABLE
is_same_trading_and_brand      TINYINT(1)
is_supermarket                 TINYINT(1)    DEFAULT 0     — set by StationTaggingService
is_motorway_service_station    TINYINT(1)    DEFAULT 0
is_supermarket_service_station TINYINT(1)    DEFAULT 0
temporary_closure              TINYINT(1)    DEFAULT 0
permanent_closure              TINYINT(1)    DEFAULT 0
permanent_closure_date         DATE          NULLABLE
public_phone_number            VARCHAR(20)   NULLABLE
address_line_1                 VARCHAR(255)  NULLABLE
address_line_2                 VARCHAR(255)  NULLABLE
city                           VARCHAR(100)  NULLABLE
county                         VARCHAR(100)  NULLABLE
country                        VARCHAR(64)   NULLABLE
postcode                       VARCHAR(10)
lat                            DECIMAL(10,7)
lng                            DECIMAL(10,7)
amenities                      JSON          NULLABLE
opening_times                  JSON          NULLABLE
fuel_types                     JSON          NULLABLE      — array of supported fuel type strings
last_seen_at                   DATETIME

station_prices_current

One row per (station_id, fuel_type). Upserted on every price change. Used by scoring engine for current-price lookups — never needs to touch the history table.

station_id          VARCHAR(64)      FK → stations.node_id
fuel_type           ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence         SMALLINT UNSIGNED   — price × 100 (e.g. 15990 = 159.90p, never float)
price_effective_at  DATETIME            — price_change_effective_timestamp from API
price_reported_at   DATETIME            — price_last_updated from API
recorded_at         DATETIME            — when this row was last upserted

PRIMARY KEY (station_id, fuel_type)

station_prices

Append-only price history. One row per price change per station+fuel. Partitioned monthly on price_effective_at. Covers the last 12 months (hot table).

id                  BIGINT UNSIGNED  AUTO_INCREMENT
station_id          VARCHAR(64)      FK → stations.node_id
fuel_type           ENUM('e10','e5','b7_standard','b7_premium','b10','hvo')
price_pence         SMALLINT UNSIGNED
price_effective_at  DATETIME
price_reported_at   DATETIME
recorded_at         DATETIME

PRIMARY KEY (id, price_effective_at)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)
PARTITION BY RANGE (YEAR(price_effective_at) * 100 + MONTH(price_effective_at))

Deduplication: only insert a new row if price_pence differs from the most recent stored value for that (station_id, fuel_type). This prevents duplicates on full refreshes when prices haven't changed.

station_prices_archive

Identical schema to station_prices but no partitioning. Rows older than 12 months are moved here by a monthly scheduled command. Used only for trend graphs and historical comparisons — never queried by the scoring engine.

(same columns as station_prices — no partition)
INDEX (station_id, fuel_type, price_effective_at)
INDEX (price_effective_at)

Relationships

stations.node_id  ←──  station_prices_current.station_id
                  ←──  station_prices.station_id
                  ←──  station_prices_archive.station_id

Service responsibilities

FuelPriceService

  1. Fetch/cache OAuth token
  2. Incremental poll every 15 min: GET /pfs/fuel-prices, upsert station_prices_current, insert into station_prices where price changed
  3. Full refresh daily: iterate all batches of /pfs (upsert stations) and /pfs/fuel-prices (same price logic)
  4. Call StationTaggingService to set is_supermarket and normalise brand_name
  5. Dispatch PricesUpdatedEvent after each poll

StationTaggingService

  • Matches trading_name against known supermarket brands (case-insensitive)
  • Sets is_supermarket = 1 and normalises brand_name

Scheduled archive command

  • Runs monthly
  • Moves rows from station_prices where price_effective_at < NOW() - 12 months into station_prices_archive
  • Drops the corresponding old partition from station_prices

Open questions / adjustable later

  • Exact partition pre-creation strategy (how many months ahead to create partitions)
  • Whether station_prices_archive needs its own partitioning if it grows very large
  • Additional fuel types if the API introduces new ones (extend ENUM in migration)