Created migration for storing UK weekly pump prices from BEIS publications.
Table uses Monday date as primary key and stores petrol/diesel pump prices,
duty, and VAT rates as integer pence or percentage values.
- Made `/api/auth/me` public and return explicit allowlist (name, email,
two_factor_confirmed_at, tier, subscription fields) instead of spreading
`$user->toArray()` which leaked is_admin, stripe_id, pm_type, pm_last_four,
postcode. Returns `null` when unauthenticated rather than 401.
- Moved `/auth/logout` to remain behind auth:sanctum gate.
- Added 3×200ms retry with exponential backoff to EiaBrentPriceSource and
FredBrentPriceSource on ConnectionException or 5xx responses. Timeout
raised from 10s to 30s.
- Both sources now throw typed BrentPriceFetchException on exhausted retries
instead of silently returning null + logging. Updated tests to assert
exception message includes HTTP status or "connection failed".
Documented explicit prohibition of `migrate:fresh`, `migrate:reset`,
`db:wipe`, and raw DROP/TRUNCATE operations in CLAUDE.md. Prose rule
clarifies that user phrases like "trust me" or "do the refactor" are
not authorisation for schema rebuilds — architectural decision is
separate from operational step.
Added matching deny patterns to `.claude/settings.json` to block
direct inv
2026-04-30 09:01:20 +01:00
10 changed files with 816 additions and 76 deletions
UK fuel price intelligence app. Subscribers receive fill-up timing recommendations
based on local price trends. Built solo by a PHP/Laravel developer.
## Destructive DB operations — HARD STOP
**Never run** the following commands. If one of them is the right step, stop, tell the user the exact command, and ask them to run it themselves:
- `php artisan migrate:fresh` (with any flags, including `--seed`)
- `php artisan migrate:reset`
- `php artisan db:wipe`
- Raw `DROP TABLE`, `DROP DATABASE`, or `TRUNCATE` via tinker, `database-query`, or any MCP tool
- Any sequence that effectively rebuilds the schema or drops tables
These are also blocked at the harness level via `.claude/settings.json` deny rules, but the prose rule applies everywhere the block doesn't reach (compound shell commands, MCP tools, etc.).
A user saying "trust me", "do the refactor", "clean up the mess", or "I want it in db" is **not** authorisation for these — the architectural decision is separate from the operational step. If a migration is awkward to apply in-place, propose the in-place version (read JSON → populate new columns → drop the old column) instead of suggesting a rebuild. Asking once at the start of a task does not authorise repeat wipes later in the session.
## Project overview
- **Product**: "Fill up now or wait?" — local fuel price trend scoring for UK drivers
| `station_prices` | 75 days of changes since 2026-01-16, sample mix uneven per day | Not modelled in v1, but **used by the volatility regime detector** as a churn indicator (% stations changing price / day vs 30-day baseline). |
| `brent_prices` | 30 days only | **Backfilled in Phase 7** (8 years from FRED, single API call). Used as a Brent-move volatility trigger and as fuel for the daily LLM overlay. |
The Fuel Finder API has been confirmed empirically to have **no historical
archive** — `effective-start-timestamp` is a station-level filter on current
prices, not a time-window query. Per-station deep history can only accrue
forward from the date polling started.
---
## Architecture — five thin layers
### Layer 1 — National weekly forecaster (predictive, calibrated)
Trained once weekly on `weekly_pump_prices`. Output:
- `direction ∈ {rising, falling, flat}`
- `magnitude_pence` — predicted Δ price next week
- `ridge_confidence` (0–100) — calibrated from backtested residuals, not
from the model's raw output
This is the **quantitative baseline**. It updates only when the BEIS Monday
publication arrives (so the *forecast itself* changes weekly), but its
*displayed confidence* (Layer 3) is adjusted in real time by Layers 4 and 5.
| `Δulsd_lag_0` | Diesel cross-signal as a *change* |
| `ulsp[t] − ma8[t]` | **Mean-reversion term** — gap between current price and 8-week MA. Single most useful feature for 1-week-ahead UK pump forecast. |
- LLM disagreement only suppresses the verdict when `ridge_confidence < 75`.
Above 75 the model's call is strong enough to stand even with a news-scan
disagreement (the LLM is hard-capped at 75 confidence anyway, so it
can't out-confidence the ridge model — only flag a tension).
Local position from Layer 2 modifies urgency wording only:
- If user's local average is materially above national (>2p), and Layer 1
says "rising", urgency increased ("fill up now, *especially* in your area").
- Layer 2 never flips Layer 1's direction.
---
## Methodology — Layer 4 (LLM news overlay)
Single scheduled call daily at 07:00 UK. Additional event-driven calls
are queued by Layer 5 when the volatility flag flips ON, with a 4-hour
cooldown enforced in code (skip the queue if the most recent
`llm_overlays.ran_at` is within 4 hours).
**Brent input** (`brent_recent_14_days`) is optional — passed as `null`
until Phase 7 backfills `brent_prices`. Phase 8 cannot ship before
Phase 7 — explicit dependency.
### Request shape (JSON)
```json
{
"input": {
"ulsp_recent_8_weeks": [...],
"brent_recent_14_days": [...],
"current_week_of_year": 18,
"days_to_next_bank_holiday": 5,
"duty_pence": 52.95,
"ridge_model_says": {
"direction": "down",
"confidence": 68,
"magnitude_pence": -0.4
}
},
"ask": "Search recent news for oil-supply, OPEC, refinery, shipping, sanctions, geopolitical events affecting UK retail fuel prices over the next 1-2 weeks. Reply ONLY in the schema below."
Hourly cron. **Sole owner** of `volatility_regimes.active`. Reads four
signals, OR-combined:
1. **Brent move** — close-to-close daily Brent move > 3% on FRED
`DCOILBRENTEU`. FRED publishes with a one-day lag (today's value is
yesterday's settle), so the trigger reflects the most recent settled
day. Sufficient for v1 — we don't have a real-time Brent feed.
2. **LLM major-impact flag** — most recent `llm_overlays` row has
`major_impact_event = true` AND at least one verified URL.
3. **Station churn** — *gated until ≥180 days of stable polling.* The
trigger fires when the last-24h % of stations updating price exceeds
1.5× the 30-day rolling baseline. With only 75 days of uneven polling
(Jan 16 → May 1) the baseline is meaningless — sample-mix variance
would dominate any real shock signal. The trigger is implemented but
disabled in code via a feature flag; flip it on once `station_prices`
has 180+ continuous days.
4. **Manual `watched_events`** — a row covering today. Lets you flag
known geopolitical periods manually (e.g. "Iran tensions Apr–May 2026").
When the flag flips on:
- An event-driven Layer 4 LLM refresh is queued (skipped if the most
recent `llm_overlays.ran_at` is within 4 hours — cooldown).
- **Layer 3's gate 3 fires**: verdict forced to `no_signal` with the
`volatile` badge for as long as the flag stays on.
- Reasoning text appended: *"Volatility detected ({trigger label}) — this
forecast may be stale within days."*
When it flips off:
- Verdict returns to whatever the gates produce on the unchanged
`ridge_confidence` (no multiplier reset needed — there are no multipliers).
- Badge cleared.
- The next morning's 07:00 LLM call still runs (always does); no extra
refreshes are queued by Layer 5.
---
## Schema deltas
### Add
```
weekly_forecasts
id BIGINT PK
forecast_for DATE — Monday the forecast covers
model_version VARCHAR(32) — links back to backtests row
direction ENUM('rising','falling','flat')
magnitude_pence SMALLINT — predicted Δ × 100, signed
ridge_confidence TINYINT UNSIGNED — 0..100, calibrated from backtested residuals. Displayed verbatim. Layer 3 gates may suppress the verdict but never modify this number.
flagged_duty_change BOOLEAN — true if forecast is within ±4 weeks of a duty change (avoids collision with Layer 5's volatility_regimes)
reasoning TEXT — generated from features actually used
| **1. Backtest harness** | `BacktestRunner` service + `backtests` table. Takes a model class, train/eval split, returns directional accuracy + MAE + calibration curve. **Structural leak detection** built in (per-feature source-timestamp check vs target Monday); accuracy>75% smell test as secondary. | A way to *prove* any future model works before shipping it. |
| **2. Naive baseline** | "Predict next week = this week" implemented as a model class. Run through harness. | A floor: any future model must beat this. |
| **3. v1 ridge model** | Features above (incl. mean-reversion term), trained once, persisted with `model_version`. `WeeklyForecastService` runs it. Backtest must clear the acceptance gate. | First real forecast. Backtested numbers visible. |
| **4. Live wiring** | Replace `NationalFuelPredictionService` internals with a thin adapter delegating to `WeeklyForecastService`. Same API shape, new engine. | Frontend keeps working, predictions now from the new model. |
| **5. Local snapshot** | `LocalSnapshotService` — pure aggregates. Wire into `/api/stations` payload alongside the headline forecast. | "Your area" descriptive cards. |
| **6. Honesty layer** | Reasoning generator describes *what the model used*: lag values, season, holiday flag. Shows backtest accuracy badge. Returns explicit "not enough data" when confidence <40.Surfacestheduty-change-adjacentflagwhenset.|The"noBS"framing.|
| **7. Brent backfill + daily refresh** | One FRED call (2018→today, ~2,150 daily rows). Daily refresh cron at **06:30 UK** (must complete before Phase 8's 07:00 LLM call — sequenced so the LLM has fresh Brent context). Used by Phase 9's volatility detector and as a feature option for future model iterations (only added to the ridge model if backtested lift is ≥3 percentage points directional accuracy). | Daily Brent in DB. Foundation for volatility + LLM context. |
| **8. LLM news overlay** | `LlmOverlayService` — single scheduled call at **07:00 UK** (after Brent refresh). Plus event-driven calls when Layer 5 flips the volatility flag on, with 4h cooldown. JSON in / JSON out, web search enabled, results stored in `llm_overlays`. Feeds Layer 3's gate 4 (suppress when LLM disagrees AND ridge_confidence <75)andthe`agrees`/`conflicting`badges.URL-verification+empty-citationrejectionenforcedincode.**Depends on Phase 7.**|News-awareverdictsuppressionandbadgeontopofthecalibratedridgebaseline.|
| **9. Volatility regime detector** | `VolatilityRegimeService` — hourly cron, sole owner of `volatility_regimes.active`. OR-combines four triggers: Brent move > 3%, LLM `major_impact_event`, station churn > 1.5× baseline (**gated until ≥180 days of stable polling**), `watched_events` row covering today. Fires Layer 3's gate 3 (verdict → `no_signal` with `volatile` badge) and the event-driven Layer 4 refresh. | The intra-week safety net for oil shocks. |
| 60–62% | Marginal. One feature iteration, then re-evaluate. |
| **62–68%** | **Ship.** Realistic target for UK weekly pump direction without Brent. |
| 68–75% | Excellent. Ship and watch closely. |
| > 75% | **Stop.** Run the structural leak detector. Almost certainly time leakage (e.g. using `t+1` info accidentally in `t` features). The accuracy threshold is a secondary smell test, not the primary detector. |
| MAE > 1.0p / litre | Features are noisy. Refit before shipping. |
| Target MAE | 0.4–0.7p / litre. |
### Structural leak detection (primary)
Built into the backtest harness. For every (training_week, feature_value)
pair, the harness verifies the data source's effective timestamp is
**strictly before** the target Monday. Any feature whose source timestamp
is on or after the target week is treated as leakage and the backtest
fails fast. This is independent of accuracy — it catches leakage even
when it doesn't translate into suspiciously high accuracy.
The `> 75% accuracy` row is a secondary smell test for leakage modes the
structural check missed (e.g. label leakage via a downstream computed
column). Primary defence is the timestamp check. These numbers are
encoded in the harness as assertions, not aspirations.
---
## Honesty rules — non-negotiables
1. Backtest accuracy is **published in the UI**. The model wears its track
record on its sleeve.
2. Below 40 confidence, the recommendation is `no_signal` and the reasoning
says "we don't have enough signal to call it" — explicitly. No filler.
3. When duty-change-adjacent weeks affect the forecast, surface the flag
("forecast may be skewed by recent duty change").
4. Reasoning text only references features the model actually used — no
narrative invention. If the mean-reversion term drove the call, say so
("Pump prices are 3.1p above their 8-week average, and prices typically
pull back from that level"). If the seasonality term drove it, say so.
5. `forecast_outcomes` is populated automatically when the next BEIS week
lands. Hit rate over the trailing 13 weeks is shown next to the headline.
6. When the **volatility regime flag** is on, the UI shows the `volatile`
badge and the trigger (e.g. "Brent up 4.2% yesterday — forecast may be
stale within days"). Verdict is suppressed visibly via gate 3, never
silently.
7. The LLM overlay is **shown separately** from the ridge model, never
for the volatility detector, regardless of whether it's used in the
ridge model.
- **LLM** — Anthropic Claude Haiku with web search. Single scheduled call
at 07:00 UK (after the 06:30 Brent refresh). Plus event-driven refreshes
when Layer 5 flips the volatility flag on, with a 4h cooldown. No fixed
afternoon cron — by 13:00 UK, morning users have already made their
fill-up decisions, so the value is too low to justify the extra noise.
Hard confidence cap 75. Empty-citation rejection.
---
## Changelog (substantive design decisions)
| When | Change | Why |
|---|---|---|
| 2026-05-01 v1 | Initial spec — three layers, six-signal aggregator removed, ridge model on BEIS weekly data | Replace incoherent `NationalFuelPredictionService` |
| 2026-05-01 v2 | Added Layer 4 (LLM news overlay) and Layer 5 (volatility regime detector). Pump prices can move daily during oil shocks; static weekly forecast must be backed by intra-week safety nets. | Iran/Hormuz-style shocks make a Monday-only confidence number stale by Wednesday |
| 2026-05-01 v3 | **Verdict via rule gates, not multipliers.**`ridge_confidence` displayed verbatim. LLM and volatility presented as badges. `weeks_since_duty_change` removed from features (kept as calibration override only — n=1 can't fit a coefficient). Backtest gate floor lowered 65 → 62 (realistic without Brent). Structural leak detection (per-feature timestamp check) made primary; accuracy>75% demoted to secondary smell test. `weekly_forecasts` PK changed to `(forecast_for, model_version)` to preserve audit on retrain. `forecast_outcomes` made three-class. Layer 5 station-churn trigger gated until ≥180 days of stable polling. | Multipliers obscure calibration. Gates compose cleanly and stay auditable. |
---
## References
- Alquist, Kilian, Vigfusson (2013) — *Forecasting the Price of Oil* —
the academic basis for "no-change baseline beats most structural models
at <6mhorizons"(whichiswhyPhase2mattersasahardfloor).
- BEIS *Weekly road fuel prices* CSV — the 435-week training set.
- `.claude/rules/scoring.md`, `.claude/rules/prediction.md` — the two
it('filters out FRED missing value markers',function():void{
Http::fake([
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.