Compare commits

..

4 Commits

Author SHA1 Message Date
Ovidiu U
25cf022964 feat: add prediction rebuild design spec — Layer 1 ridge model, LLM news overlay, volatility regime detector
Some checks failed
linter / quality (push) Has been cancelled
tests / ci (8.3) (push) Has been cancelled
tests / ci (8.4) (push) Has been cancelled
tests / ci (8.5) (push) Has been cancelled
Documents complete replacement of six-signal aggregator with calibrated
ridge forecaster trained on 435 weeks of BEIS pump prices. Five-layer
architecture: weekly baseline (Layer 1), local snapshot (Layer 2),
rule-gated verdict merger (Layer 3), daily LLM news
2026-05-01 13:23:10 +01:00
Ovidiu U
e821a934a5 feat: add weekly_pump_prices migration for BEIS fuel price data
Some checks failed
linter / quality (push) Has been cancelled
tests / ci (8.3) (push) Has been cancelled
tests / ci (8.4) (push) Has been cancelled
tests / ci (8.5) (push) Has been cancelled
Created migration for storing UK weekly pump prices from BEIS publications.
Table uses Monday date as primary key and stores petrol/diesel pump prices,
duty, and VAT rates as integer pence or percentage values.
2026-05-01 13:22:50 +01:00
Ovidiu U
73de53994f fix: prevent sensitive field leaks in /me, add retry logic to Brent price sources
Some checks failed
linter / quality (push) Has been cancelled
tests / ci (8.3) (push) Has been cancelled
tests / ci (8.4) (push) Has been cancelled
tests / ci (8.5) (push) Has been cancelled
- Made `/api/auth/me` public and return explicit allowlist (name, email,
  two_factor_confirmed_at, tier, subscription fields) instead of spreading
  `$user->toArray()` which leaked is_admin, stripe_id, pm_type, pm_last_four,
  postcode. Returns `null` when unauthenticated rather than 401.
- Moved `/auth/logout` to remain behind auth:sanctum gate.
- Added 3×200ms retry with exponential backoff to EiaBrentPriceSource and
  FredBrentPriceSource on ConnectionException or 5xx responses. Timeout
  raised from 10s to 30s.
- Both sources now throw typed BrentPriceFetchException on exhausted retries
  instead of silently returning null + logging. Updated tests to assert
  exception message includes HTTP status or "connection failed".
2026-05-01 13:22:36 +01:00
Ovidiu U
df70e514e9 refactor: add hard-stop documentation and deny-list for destructive DB commands
Some checks failed
linter / quality (push) Has been cancelled
tests / ci (8.3) (push) Has been cancelled
tests / ci (8.4) (push) Has been cancelled
tests / ci (8.5) (push) Has been cancelled
Documented explicit prohibition of `migrate:fresh`, `migrate:reset`,
`db:wipe`, and raw DROP/TRUNCATE operations in CLAUDE.md. Prose rule
clarifies that user phrases like "trust me" or "do the refactor" are
not authorisation for schema rebuilds — architectural decision is
separate from operational step.

Added matching deny patterns to `.claude/settings.json` to block
direct inv
2026-04-30 09:01:20 +01:00
10 changed files with 816 additions and 76 deletions

View File

@@ -18,7 +18,13 @@
"Bash(rg * .env)", "Bash(rg * .env)",
"Bash(rg * ./.env)", "Bash(rg * ./.env)",
"Bash(awk * .env)", "Bash(awk * .env)",
"Bash(awk * ./.env)" "Bash(awk * ./.env)",
"Bash(php artisan migrate:fresh)",
"Bash(php artisan migrate:fresh *)",
"Bash(php artisan migrate:reset)",
"Bash(php artisan migrate:reset *)",
"Bash(php artisan db:wipe)",
"Bash(php artisan db:wipe *)"
] ]
} }
} }

View File

@@ -3,6 +3,20 @@
UK fuel price intelligence app. Subscribers receive fill-up timing recommendations UK fuel price intelligence app. Subscribers receive fill-up timing recommendations
based on local price trends. Built solo by a PHP/Laravel developer. based on local price trends. Built solo by a PHP/Laravel developer.
## Destructive DB operations — HARD STOP
**Never run** the following commands. If one of them is the right step, stop, tell the user the exact command, and ask them to run it themselves:
- `php artisan migrate:fresh` (with any flags, including `--seed`)
- `php artisan migrate:reset`
- `php artisan db:wipe`
- Raw `DROP TABLE`, `DROP DATABASE`, or `TRUNCATE` via tinker, `database-query`, or any MCP tool
- Any sequence that effectively rebuilds the schema or drops tables
These are also blocked at the harness level via `.claude/settings.json` deny rules, but the prose rule applies everywhere the block doesn't reach (compound shell commands, MCP tools, etc.).
A user saying "trust me", "do the refactor", "clean up the mess", or "I want it in db" is **not** authorisation for these — the architectural decision is separate from the operational step. If a migration is awkward to apply in-place, propose the in-place version (read JSON → populate new columns → drop the old column) instead of suggesting a rebuild. Asking once at the start of a task does not authorise repeat wipes later in the session.
## Project overview ## Project overview
- **Product**: "Fill up now or wait?" — local fuel price trend scoring for UK drivers - **Product**: "Fill up now or wait?" — local fuel price trend scoring for UK drivers

View File

@@ -64,19 +64,24 @@ class AuthController extends Controller
public function me(Request $request): JsonResponse public function me(Request $request): JsonResponse
{ {
$user = $request->user(); $user = $request->user();
if ($user === null) {
return new JsonResponse('null', json: true);
}
$subscription = $user->subscription(); $subscription = $user->subscription();
$expiresAt = $subscription?->ends_at ?? $subscription?->current_period_end; $expiresAt = $subscription?->ends_at ?? $subscription?->current_period_end;
return response()->json(array_merge( return response()->json([
$user->toArray(), 'name' => $user->name,
[ 'email' => $user->email,
'tier' => PlanFeatures::for($user)->tier(), 'two_factor_confirmed_at' => $user->two_factor_confirmed_at?->toIso8601String(),
'subscription_cancelled' => $subscription?->canceled() ?? false, 'tier' => PlanFeatures::for($user)->tier(),
'subscription_cadence' => Plan::resolveCadenceForUser($user), 'subscription_cancelled' => $subscription?->canceled() ?? false,
'subscribed_at' => $subscription?->created_at?->toIso8601String(), 'subscription_cadence' => Plan::resolveCadenceForUser($user),
'subscription_expires_at' => $expiresAt?->toIso8601String(), 'subscribed_at' => $subscription?->created_at?->toIso8601String(),
], 'subscription_expires_at' => $expiresAt?->toIso8601String(),
)); ]);
} }
} }

View File

@@ -3,8 +3,9 @@
namespace App\Services\BrentPriceSources; namespace App\Services\BrentPriceSources;
use App\Services\ApiLogger; use App\Services\ApiLogger;
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
use Illuminate\Support\Facades\Http; use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;
use Throwable; use Throwable;
final class EiaBrentPriceSource final class EiaBrentPriceSource
@@ -14,12 +15,16 @@ final class EiaBrentPriceSource
public function __construct(private readonly ApiLogger $apiLogger) {} public function __construct(private readonly ApiLogger $apiLogger) {}
/** /**
* @return array{date: string, price_usd: float}[]|null * @return array{date: string, price_usd: float}[]|null null only when the response carried no usable rows
*
* @throws BrentPriceFetchException on network failure or non-2xx response after retries
*/ */
public function fetch(): ?array public function fetch(): ?array
{ {
try { try {
$response = $this->apiLogger->send('eia', 'GET', self::URL, fn () => Http::timeout(10) $response = $this->apiLogger->send('eia', 'GET', self::URL, fn () => Http::timeout(30)
->retry(3, 200, fn (Throwable $e) => $this->shouldRetry($e))
->throw()
->get(self::URL, [ ->get(self::URL, [
'api_key' => config('services.eia.api_key'), 'api_key' => config('services.eia.api_key'),
'frequency' => 'daily', 'frequency' => 'daily',
@@ -29,32 +34,26 @@ final class EiaBrentPriceSource
'sort[0][direction]' => 'desc', 'sort[0][direction]' => 'desc',
'length' => 30, 'length' => 30,
])); ]));
} catch (ConnectionException $e) {
if (! $response->successful()) { throw new BrentPriceFetchException("EIA connection failed: {$e->getMessage()}", previous: $e);
Log::error('EiaBrentPriceSource: request failed', ['status' => $response->status()]); } catch (RequestException $e) {
throw new BrentPriceFetchException("EIA returned HTTP {$e->response->status()}", previous: $e);
return null;
}
$rows = collect($response->json('response.data') ?? [])
->filter(fn (array $row) => ($row['value'] ?? '.') !== '.')
->map(fn (array $row) => [
'date' => $row['period'],
'price_usd' => (float) $row['value'],
])
->all();
if ($rows === []) {
Log::warning('EiaBrentPriceSource: no valid observations returned');
return null;
}
return $rows;
} catch (Throwable $e) {
Log::error('EiaBrentPriceSource: fetch failed', ['error' => $e->getMessage()]);
return null;
} }
$rows = collect($response->json('response.data') ?? [])
->filter(fn (array $row) => ($row['value'] ?? '.') !== '.')
->map(fn (array $row) => [
'date' => $row['period'],
'price_usd' => (float) $row['value'],
])
->all();
return $rows === [] ? null : $rows;
}
private function shouldRetry(Throwable $e): bool
{
return $e instanceof ConnectionException
|| ($e instanceof RequestException && $e->response->serverError());
} }
} }

View File

@@ -3,8 +3,9 @@
namespace App\Services\BrentPriceSources; namespace App\Services\BrentPriceSources;
use App\Services\ApiLogger; use App\Services\ApiLogger;
use Illuminate\Http\Client\ConnectionException;
use Illuminate\Http\Client\RequestException;
use Illuminate\Support\Facades\Http; use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Log;
use Throwable; use Throwable;
final class FredBrentPriceSource final class FredBrentPriceSource
@@ -14,12 +15,16 @@ final class FredBrentPriceSource
public function __construct(private readonly ApiLogger $apiLogger) {} public function __construct(private readonly ApiLogger $apiLogger) {}
/** /**
* @return array{date: string, price_usd: float}[]|null * @return array{date: string, price_usd: float}[]|null null only when the response carried no usable rows
*
* @throws BrentPriceFetchException on network failure or non-2xx response after retries
*/ */
public function fetch(): ?array public function fetch(): ?array
{ {
try { try {
$response = $this->apiLogger->send('fred', 'GET', self::URL, fn () => Http::timeout(10) $response = $this->apiLogger->send('fred', 'GET', self::URL, fn () => Http::timeout(30)
->retry(3, 200, fn (Throwable $e) => $this->shouldRetry($e))
->throw()
->get(self::URL, [ ->get(self::URL, [
'series_id' => 'DCOILBRENTEU', 'series_id' => 'DCOILBRENTEU',
'api_key' => config('services.fred.api_key'), 'api_key' => config('services.fred.api_key'),
@@ -27,32 +32,26 @@ final class FredBrentPriceSource
'limit' => 30, 'limit' => 30,
'file_type' => 'json', 'file_type' => 'json',
])); ]));
} catch (ConnectionException $e) {
if (! $response->successful()) { throw new BrentPriceFetchException("FRED connection failed: {$e->getMessage()}", previous: $e);
Log::error('FredBrentPriceSource: request failed', ['status' => $response->status()]); } catch (RequestException $e) {
throw new BrentPriceFetchException("FRED returned HTTP {$e->response->status()}", previous: $e);
return null;
}
$rows = collect($response->json('observations') ?? [])
->filter(fn (array $obs) => $obs['value'] !== '.')
->map(fn (array $obs) => [
'date' => $obs['date'],
'price_usd' => (float) $obs['value'],
])
->all();
if ($rows === []) {
Log::warning('FredBrentPriceSource: no valid observations returned');
return null;
}
return $rows;
} catch (Throwable $e) {
Log::error('FredBrentPriceSource: fetch failed', ['error' => $e->getMessage()]);
return null;
} }
$rows = collect($response->json('observations') ?? [])
->filter(fn (array $obs) => $obs['value'] !== '.')
->map(fn (array $obs) => [
'date' => $obs['date'],
'price_usd' => (float) $obs['value'],
])
->all();
return $rows === [] ? null : $rows;
}
private function shouldRetry(Throwable $e): bool
{
return $e instanceof ConnectionException
|| ($e instanceof RequestException && $e->response->serverError());
} }
} }

View File

@@ -0,0 +1,32 @@
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration
{
/**
* Run the migrations.
*/
public function up(): void
{
Schema::create('weekly_pump_prices', function (Blueprint $table) {
$table->date('date')->primary()->comment('Week starting (Monday) per BEIS publication');
$table->unsignedSmallInteger('ulsp_pence')->comment('Petrol pump price × 100');
$table->unsignedSmallInteger('ulsd_pence')->comment('Diesel pump price × 100');
$table->unsignedSmallInteger('ulsp_duty_pence')->comment('Petrol duty × 100');
$table->unsignedSmallInteger('ulsd_duty_pence')->comment('Diesel duty × 100');
$table->unsignedTinyInteger('ulsp_vat_pct')->comment('VAT %');
$table->unsignedTinyInteger('ulsd_vat_pct')->comment('VAT %');
});
}
/**
* Reverse the migrations.
*/
public function down(): void
{
Schema::dropIfExists('weekly_pump_prices');
}
};

View File

@@ -0,0 +1,618 @@
# Prediction Rebuild — Design Spec
## Context
The current prediction service (`NationalFuelPredictionService` + six signal
classes) produces output the user has repeatedly described as "doesn't make
sense": headlines that contradict their own reasoning text, weights that
nobody can defend a number on, and confidence values that aren't grounded in
any track record. Two earlier docs (`.claude/rules/scoring.md`, `.claude/rules/prediction.md`)
disagree on the weights of the same signals, which is itself evidence that
the design has drifted.
This spec replaces the entire prediction stack from scratch around the
historical data we actually have, with a model whose confidence values are
calibrated against its own backtested track record.
Goals:
- A "fill up now or wait?" call honest about uncertainty.
- Confidence values calibrated against backtested residuals — "70%" actually
means "in 7 of every 10 cases like this, the model called direction right".
- Simple enough to debug a year from now.
- Remove the six-signal aggregator entirely.
- Recognise that pump prices, while *measured* weekly by BEIS, can *move* daily
during oil shocks (Iran, OPEC surprise cuts, Hormuz disruption). The static
weekly forecast must be backed by a daily news/event overlay so we can flag
staleness in real time rather than pretend a Monday number is still valid on
Thursday after a 6% Brent move.
---
## Inputs (audited 2026-05-01)
| Source | Status | Use in v1 |
|---|---|---|
| `weekly_pump_prices` | 435 weeks, all Mondays, 0 outliers, 1 duty change (Mar 2022, 57.95p → 52.95p), VAT stable at 20% | **Foundation** — train Layer 1 |
| `station_prices_current` | ~7,550 stations × e10, ~7,620 × b7_standard | **Layer 2** — descriptive snapshot |
| `stations` | 7,747 stations, 1,989 supermarkets, lat/lng | Layer 2 |
| `station_prices` | 75 days of changes since 2026-01-16, sample mix uneven per day | Not modelled in v1, but **used by the volatility regime detector** as a churn indicator (% stations changing price / day vs 30-day baseline). |
| `brent_prices` | 30 days only | **Backfilled in Phase 7** (8 years from FRED, single API call). Used as a Brent-move volatility trigger and as fuel for the daily LLM overlay. |
The Fuel Finder API has been confirmed empirically to have **no historical
archive** — `effective-start-timestamp` is a station-level filter on current
prices, not a time-window query. Per-station deep history can only accrue
forward from the date polling started.
---
## Architecture — five thin layers
### Layer 1 — National weekly forecaster (predictive, calibrated)
Trained once weekly on `weekly_pump_prices`. Output:
- `direction ∈ {rising, falling, flat}`
- `magnitude_pence` — predicted Δ price next week
- `ridge_confidence` (0100) — calibrated from backtested residuals, not
from the model's raw output
This is the **quantitative baseline**. It updates only when the BEIS Monday
publication arrives (so the *forecast itself* changes weekly), but its
*displayed confidence* (Layer 3) is adjusted in real time by Layers 4 and 5.
`direction = flat` whenever `|magnitude_pence| < FLAT_THRESHOLD`. Phase 3
picks `FLAT_THRESHOLD` from the backtest residual distribution; the
starting value is **0.2p / litre**.
### Layer 2 — Local snapshot (descriptive, NOT predictive)
Pure SQL aggregates against `station_prices_current` + Haversine on
`stations.lat/lng`. No ML, no history, no surprises:
- `local_avg_50km(fuel_type, lat, lng)`
- `national_avg(fuel_type)`
- `cheapest_within(km, fuel_type, lat, lng)`
- `supermarket_avg_local`, `major_avg_local`, gap
Layer 2 never speaks about the future. It describes the present.
### Layer 3 — Verdict merger (rule-based gates, no multipliers)
Single user-facing verdict ∈ {`fill_now`, `wait`, `no_signal`}. The
displayed confidence number is `ridge_confidence` itself, **untouched**.
LLM agreement and volatility status are shown as separate **badges**, not
blended into the number. Honesty over smoothing.
Gates evaluated in order, first match wins:
```
1. direction == 'flat' → no_signal
2. ridge_confidence < 40 no_signal
3. volatility_regime active → no_signal (badge: volatile)
4. LLM disagrees AND ridge_confidence < 75 no_signal (badge: conflicting)
5. rising AND ridge_confidence >= 70 → fill_now
6. falling AND ridge_confidence >= 70 → wait
7. otherwise (40 <= conf < 70, no veto from 3 or 4) dashboard-only
```
Why gates, not multipliers:
- A multiplied confidence number is a black-box blend that the user can't
audit. A 70% that used to be 90% before today's volatility hit looks
identical to a 70% that's been calibrated all along.
- Gates compose cleanly. Each rule has one job and is independently
testable.
- The verdict is binary anyway (notify / don't / silent). Smoothing
confidence under the hood doesn't help that decision — it only obscures it.
Layer 2 affects **urgency wording only** ("fill up now, *especially* in
your area at 2p above national"). It never changes the verdict. Neither
does Layer 4 or Layer 5 — they can suppress (gate 3, 4) but never flip
the direction.
### Layer 4 — Daily LLM news overlay (qualitative, news-aware)
**Single scheduled call at 07:00 UK.** Plus an event-driven refresh when
Layer 5's volatility flag flips ON (with a 4-hour cooldown so the same
event doesn't trigger repeatedly).
JSON in, JSON out. Calls Claude Haiku with web search enabled, asks for
direction + confidence + cited events with URLs. Stored in a new
`llm_overlays` table.
Layer 4 is **read-only with respect to the volatility flag**. It writes
its result row; only Layer 5 mutates `volatility_regimes.active`.
LLM confidence is hard-capped at 75 in code (web-searched LLMs are
systematically overconfident). Calls without `events_cited` are rejected.
### Layer 5 — Volatility regime detector (intra-week safety net)
Hourly cron. **Sole owner** of the `volatility_regimes.active` flag.
Reads four signals, OR-combined:
1. Daily Brent move > 3% close-to-close (FRED `DCOILBRENTEU`, Phase 7).
2. Most recent `llm_overlays.major_impact_event = true` AND at least one
verified URL.
3. `station_prices` daily churn rate > 1.5× its 30-day baseline.
4. A `watched_events` row covering today (manually flagged geopolitical
periods).
When the flag flips on:
- An event-driven LLM refresh is queued (Layer 4) if last run was > 4h ago.
- **Layer 3's gate 3 fires**: verdict forced to `no_signal` with the
`volatile` badge.
- The reasoning text appended: *"Volatility detected ({trigger}) — this
forecast may be stale within days."*
When it flips off:
- Verdict returns to whatever the gates produce on the unchanged
`ridge_confidence` (no multiplier to reset — there are none).
- Badge cleared.
- Next morning's 07:00 LLM call still runs (it always runs); no extra
refreshes are queued.
Layer 5 never changes Layer 1's *direction*. It only suppresses the
verdict via gate 3.
---
## Methodology — Layer 1
### Target
```
ΔULSP[t+1] = ULSP[t+1] ULSP[t]
```
We model the **change**, not the level. UK pump prices are non-stationary,
so regressing on levels gives spurious R² and useless coefficients.
Differencing makes the series stationary.
### Features (all stationary)
| Feature | Notes |
|---|---|
| `Δulsp_lag_0`, `Δulsp_lag_1`, `Δulsp_lag_3` | 1w / 2w / 4w momentum |
| `Δulsd_lag_0` | Diesel cross-signal as a *change* |
| `ulsp[t] ma8[t]` | **Mean-reversion term** — gap between current price and 8-week MA. Single most useful feature for 1-week-ahead UK pump forecast. |
| `week_of_year_sin`, `week_of_year_cos` | Cyclic seasonality encoding |
| `is_pre_bank_holiday` | Boolean, within 7 days of UK bank holiday |
The level only enters as the deviation from MA-8 (itself stationary).
That's the only way levels are allowed in.
**Duty change is NOT a feature.** With one event in 435 weeks, n=1 cannot
fit a meaningful coefficient. Instead, duty-change-adjacent weeks (±4
weeks of a known change) are handled in the **calibration override**
(see below) — confidence is halved and the regime flag is surfaced in
the reasoning text. A regime can be flagged. A coefficient cannot be
trained from one observation.
### Model
Ridge regression. Boring on purpose:
- 435 weekly observations is too few to beat a well-specified linear model
out-of-sample with gradient boosting or LSTM — those would just fit noise.
- Interpretable coefficients are essential for the honesty layer
(the reasoning text describes what the model used).
Upgrade to a non-linear model **only** if Phase 3 backtest demonstrates the
linear model is missing real structure.
### Training and evaluation split
- Train on weeks 1305 (~70%).
- Evaluate on weeks 306435 (~30%) with rolling-origin cross-validation
(single-split would overfit hyperparameters to one window).
### Confidence calibration
Two-stage calibration:
1. **Magnitude binning** — bin predictions by predicted `|magnitude|` and
record actual hit rate per bin. The published `confidence_score` reads
from this lookup, not from the model's raw output.
2. **Regime flag** — flag any forecast week within ±4 weeks of a known
duty change. With only one duty change in 435 weeks, statistical
stratification at n=1 is impossible. Instead:
- For flagged weeks, halve the calibrated confidence manually.
- Surface the flag in the reasoning text: *"Recent duty change —
forecast accuracy is reduced for the next several weeks."*
This is the only place v1 accepts a hand-tuned guard, and it's there
because the data can't tell us better.
---
## Methodology — Layer 2
Pure aggregates. No model.
```sql
-- Local 50km average
SELECT AVG(price_pence) FROM station_prices_current
JOIN stations ON station_prices_current.station_id = stations.node_id
WHERE fuel_type = ? AND <Haversine within 50km of (lat, lng)>;
-- National average
SELECT AVG(price_pence) FROM station_prices_current WHERE fuel_type = ?;
-- Cheapest within 25km
SELECT stations.*, station_prices_current.price_pence
FROM station_prices_current
JOIN stations ON station_prices_current.station_id = stations.node_id
WHERE fuel_type = ? AND <Haversine within 25km>
ORDER BY price_pence ASC LIMIT 5;
-- Supermarket vs major split, locally
SELECT stations.is_supermarket, AVG(price_pence)
FROM station_prices_current
JOIN stations ON station_prices_current.station_id = stations.node_id
WHERE fuel_type = ? AND <Haversine within 25km>
GROUP BY stations.is_supermarket;
```
Output is descriptive: "Your area is X p above national average right
now", "Cheapest near you: {station} at {price}", "Supermarkets near you:
{avg} vs majors: {avg}". **Never** predictive language.
---
## Methodology — Layer 3
Full gate ordering is in the Architecture section (Layer 3). Summary:
- Verdict via ordered rule gates, **not** multipliers.
- `ridge_confidence` is displayed verbatim — never multiplied.
- Volatility flag and LLM disagreement act as **suppressors with badges**
(`volatile`, `conflicting`) but never flip direction.
- `direction == 'flat'` always produces `no_signal`.
- LLM disagreement only suppresses the verdict when `ridge_confidence < 75`.
Above 75 the model's call is strong enough to stand even with a news-scan
disagreement (the LLM is hard-capped at 75 confidence anyway, so it
can't out-confidence the ridge model — only flag a tension).
Local position from Layer 2 modifies urgency wording only:
- If user's local average is materially above national (>2p), and Layer 1
says "rising", urgency increased ("fill up now, *especially* in your area").
- Layer 2 never flips Layer 1's direction.
---
## Methodology — Layer 4 (LLM news overlay)
Single scheduled call daily at 07:00 UK. Additional event-driven calls
are queued by Layer 5 when the volatility flag flips ON, with a 4-hour
cooldown enforced in code (skip the queue if the most recent
`llm_overlays.ran_at` is within 4 hours).
**Brent input** (`brent_recent_14_days`) is optional — passed as `null`
until Phase 7 backfills `brent_prices`. Phase 8 cannot ship before
Phase 7 — explicit dependency.
### Request shape (JSON)
```json
{
"input": {
"ulsp_recent_8_weeks": [...],
"brent_recent_14_days": [...],
"current_week_of_year": 18,
"days_to_next_bank_holiday": 5,
"duty_pence": 52.95,
"ridge_model_says": {
"direction": "down",
"confidence": 68,
"magnitude_pence": -0.4
}
},
"ask": "Search recent news for oil-supply, OPEC, refinery, shipping, sanctions, geopolitical events affecting UK retail fuel prices over the next 1-2 weeks. Reply ONLY in the schema below."
}
```
### Response shape (JSON, enforced)
```json
{
"direction": "rising | falling | flat",
"confidence": 0,
"reasoning_short": "1-2 sentences",
"events_cited": [
{"headline": "...", "source": "...", "url": "...", "impact": "rising|falling|neutral"}
],
"agrees_with_ridge": true,
"major_impact_event": false
}
```
### Code-level guards (not in the prompt)
1. **Cap `confidence` at 75.** Web-searched LLMs are systematically overconfident.
2. **Reject the response if `events_cited` is empty.** Forces the LLM to
ground its call in something checkable, not vibes.
3. **Verify each `url` in `events_cited` is reachable** before storing.
Catches hallucinated citations. Failed URLs blank the citation but
don't reject the call (newer URLs sometimes 404 briefly).
4. **Layer 4 does NOT mutate `volatility_regimes.active`.** It writes its
row to `llm_overlays` (with `major_impact_event` + verified URLs) and
that's it. Layer 5's hourly cron picks up the new row and decides
whether to flip the flag.
### How Layer 3 uses it
- LLM agrees → no gating effect; `agrees` badge shown next to the verdict
("News scan agrees, citing {event}").
- LLM disagrees AND `ridge_confidence < 75`**gate 4 fires**: verdict
forced to `no_signal` with the `conflicting` badge.
- LLM disagrees AND `ridge_confidence >= 75` → no suppression; the
disagreement is shown as a badge but the model's strong call stands.
- LLM neutral / flat → no gating effect.
- Direction is never flipped by the LLM.
---
## Methodology — Layer 5 (volatility regime detector)
Hourly cron. **Sole owner** of `volatility_regimes.active`. Reads four
signals, OR-combined:
1. **Brent move** — close-to-close daily Brent move > 3% on FRED
`DCOILBRENTEU`. FRED publishes with a one-day lag (today's value is
yesterday's settle), so the trigger reflects the most recent settled
day. Sufficient for v1 — we don't have a real-time Brent feed.
2. **LLM major-impact flag** — most recent `llm_overlays` row has
`major_impact_event = true` AND at least one verified URL.
3. **Station churn***gated until ≥180 days of stable polling.* The
trigger fires when the last-24h % of stations updating price exceeds
1.5× the 30-day rolling baseline. With only 75 days of uneven polling
(Jan 16 → May 1) the baseline is meaningless — sample-mix variance
would dominate any real shock signal. The trigger is implemented but
disabled in code via a feature flag; flip it on once `station_prices`
has 180+ continuous days.
4. **Manual `watched_events`** — a row covering today. Lets you flag
known geopolitical periods manually (e.g. "Iran tensions AprMay 2026").
When the flag flips on:
- An event-driven Layer 4 LLM refresh is queued (skipped if the most
recent `llm_overlays.ran_at` is within 4 hours — cooldown).
- **Layer 3's gate 3 fires**: verdict forced to `no_signal` with the
`volatile` badge for as long as the flag stays on.
- Reasoning text appended: *"Volatility detected ({trigger label}) — this
forecast may be stale within days."*
When it flips off:
- Verdict returns to whatever the gates produce on the unchanged
`ridge_confidence` (no multiplier reset needed — there are no multipliers).
- Badge cleared.
- The next morning's 07:00 LLM call still runs (always does); no extra
refreshes are queued by Layer 5.
---
## Schema deltas
### Add
```
weekly_forecasts
id BIGINT PK
forecast_for DATE — Monday the forecast covers
model_version VARCHAR(32) — links back to backtests row
direction ENUM('rising','falling','flat')
magnitude_pence SMALLINT — predicted Δ × 100, signed
ridge_confidence TINYINT UNSIGNED — 0..100, calibrated from backtested residuals. Displayed verbatim. Layer 3 gates may suppress the verdict but never modify this number.
flagged_duty_change BOOLEAN — true if forecast is within ±4 weeks of a duty change (avoids collision with Layer 5's volatility_regimes)
reasoning TEXT — generated from features actually used
generated_at DATETIME
UNIQUE (forecast_for, model_version)
INDEX (forecast_for, generated_at DESC)
forecast_outcomes
forecast_for DATE
model_version VARCHAR(32)
predicted_class ENUM('rising','falling','flat')
actual_class ENUM('rising','falling','flat')
correct BOOLEAN
abs_error_pence SMALLINT UNSIGNED
resolved_at DATETIME
PRIMARY KEY (forecast_for, model_version)
backtests
id BIGINT PK
model_version VARCHAR(32) UNIQUE
features_json JSON — feature spec
train_start DATE
train_end DATE
eval_start DATE
eval_end DATE
directional_accuracy DECIMAL(5,2)
mae_pence DECIMAL(5,2)
calibration_table JSON — {bin_low..bin_high → empirical_hit_rate}
leak_suspected BOOLEAN — secondary smell test: true if directional_accuracy > 75. Primary leak detection is structural (see Backtest section).
ran_at DATETIME
llm_overlays
id BIGINT PK
ran_at DATETIME
forecast_for_week DATE — which weekly forecast it overlays
direction ENUM('rising','falling','flat')
confidence TINYINT UNSIGNED — capped 75 in code
reasoning TEXT
events_json JSON — cited events with verified URLs
agrees_with_ridge BOOLEAN
major_impact_event BOOLEAN
volatility_flag_on BOOLEAN — was the regime flag on at run time
search_used BOOLEAN
INDEX (forecast_for_week, ran_at)
volatility_regimes
id BIGINT PK
flipped_on_at DATETIME
flipped_off_at DATETIME NULL
trigger ENUM('brent_move','llm_event','station_churn','manual')
trigger_detail TEXT — e.g. "Brent +4.2% close-to-close"
active BOOLEAN
watched_events
id BIGINT PK
label VARCHAR(128)
starts_at DATETIME
ends_at DATETIME
notes TEXT
```
### Keep
- `weekly_pump_prices` — already loaded, source of truth
- `stations`, `station_prices_current` — for Layer 2
- `station_prices` — keep collecting forward, not modelled in v1
### Deprecate (delete after Layer 1 ships)
- `price_predictions` — old LLM/EWMA store, replaced by `weekly_forecasts`
The current six-signal aggregator (`NationalFuelPredictionService` and
`app/Services/Prediction/Signals/*`) is **fully replaced**, not extended.
Same JSON output keys (`predicted_direction`, `confidence_score`,
`action`, `reasoning`) so the Vue frontend doesn't break — engine swapped,
contract preserved.
---
## Implementation phases (each ships something working)
| Phase | Scope | Ships |
|---|---|---|
| **1. Backtest harness** | `BacktestRunner` service + `backtests` table. Takes a model class, train/eval split, returns directional accuracy + MAE + calibration curve. **Structural leak detection** built in (per-feature source-timestamp check vs target Monday); accuracy>75% smell test as secondary. | A way to *prove* any future model works before shipping it. |
| **2. Naive baseline** | "Predict next week = this week" implemented as a model class. Run through harness. | A floor: any future model must beat this. |
| **3. v1 ridge model** | Features above (incl. mean-reversion term), trained once, persisted with `model_version`. `WeeklyForecastService` runs it. Backtest must clear the acceptance gate. | First real forecast. Backtested numbers visible. |
| **4. Live wiring** | Replace `NationalFuelPredictionService` internals with a thin adapter delegating to `WeeklyForecastService`. Same API shape, new engine. | Frontend keeps working, predictions now from the new model. |
| **5. Local snapshot** | `LocalSnapshotService` — pure aggregates. Wire into `/api/stations` payload alongside the headline forecast. | "Your area" descriptive cards. |
| **6. Honesty layer** | Reasoning generator describes *what the model used*: lag values, season, holiday flag. Shows backtest accuracy badge. Returns explicit "not enough data" when confidence < 40. Surfaces the duty-change-adjacent flag when set. | The "no BS" framing. |
| **7. Brent backfill + daily refresh** | One FRED call (2018→today, ~2,150 daily rows). Daily refresh cron at **06:30 UK** (must complete before Phase 8's 07:00 LLM call — sequenced so the LLM has fresh Brent context). Used by Phase 9's volatility detector and as a feature option for future model iterations (only added to the ridge model if backtested lift is ≥3 percentage points directional accuracy). | Daily Brent in DB. Foundation for volatility + LLM context. |
| **8. LLM news overlay** | `LlmOverlayService` — single scheduled call at **07:00 UK** (after Brent refresh). Plus event-driven calls when Layer 5 flips the volatility flag on, with 4h cooldown. JSON in / JSON out, web search enabled, results stored in `llm_overlays`. Feeds Layer 3's gate 4 (suppress when LLM disagrees AND ridge_confidence < 75) and the `agrees`/`conflicting` badges. URL-verification + empty-citation rejection enforced in code. **Depends on Phase 7.** | News-aware verdict suppression and badge on top of the calibrated ridge baseline. |
| **9. Volatility regime detector** | `VolatilityRegimeService` — hourly cron, sole owner of `volatility_regimes.active`. OR-combines four triggers: Brent move > 3%, LLM `major_impact_event`, station churn > 1.5× baseline (**gated until ≥180 days of stable polling**), `watched_events` row covering today. Fires Layer 3's gate 3 (verdict → `no_signal` with `volatile` badge) and the event-driven Layer 4 refresh. | The intra-week safety net for oil shocks. |
---
## Backtest acceptance gates (Phase 3 → Phase 4)
| Backtest result | Action |
|---|---|
| < 60% directional accuracy | Features are wrong. Stay in Phase 3, don't ship. |
| 6062% | Marginal. One feature iteration, then re-evaluate. |
| **6268%** | **Ship.** Realistic target for UK weekly pump direction without Brent. |
| 6875% | Excellent. Ship and watch closely. |
| > 75% | **Stop.** Run the structural leak detector. Almost certainly time leakage (e.g. using `t+1` info accidentally in `t` features). The accuracy threshold is a secondary smell test, not the primary detector. |
| MAE > 1.0p / litre | Features are noisy. Refit before shipping. |
| Target MAE | 0.40.7p / litre. |
### Structural leak detection (primary)
Built into the backtest harness. For every (training_week, feature_value)
pair, the harness verifies the data source's effective timestamp is
**strictly before** the target Monday. Any feature whose source timestamp
is on or after the target week is treated as leakage and the backtest
fails fast. This is independent of accuracy — it catches leakage even
when it doesn't translate into suspiciously high accuracy.
The `> 75% accuracy` row is a secondary smell test for leakage modes the
structural check missed (e.g. label leakage via a downstream computed
column). Primary defence is the timestamp check. These numbers are
encoded in the harness as assertions, not aspirations.
---
## Honesty rules — non-negotiables
1. Backtest accuracy is **published in the UI**. The model wears its track
record on its sleeve.
2. Below 40 confidence, the recommendation is `no_signal` and the reasoning
says "we don't have enough signal to call it" — explicitly. No filler.
3. When duty-change-adjacent weeks affect the forecast, surface the flag
("forecast may be skewed by recent duty change").
4. Reasoning text only references features the model actually used — no
narrative invention. If the mean-reversion term drove the call, say so
("Pump prices are 3.1p above their 8-week average, and prices typically
pull back from that level"). If the seasonality term drove it, say so.
5. `forecast_outcomes` is populated automatically when the next BEIS week
lands. Hit rate over the trailing 13 weeks is shown next to the headline.
6. When the **volatility regime flag** is on, the UI shows the `volatile`
badge and the trigger (e.g. "Brent up 4.2% yesterday — forecast may be
stale within days"). Verdict is suppressed visibly via gate 3, never
silently.
7. The LLM overlay is **shown separately** from the ridge model, never
blended. "Model says down (68%); news scan agrees, citing {event}" —
the `ridge_confidence` number stays calibrated and untouched, while
LLM and volatility status are presented as their own badges.
8. LLM citations with unreachable URLs are **dropped from the displayed
reasoning** but kept in `llm_overlays.events_json` for audit. We never
show a citation we haven't verified.
---
## What gets deleted at the end of Phase 4
- `app/Services/Prediction/Signals/*` (whole directory)
- `NationalFuelPredictionService` internals (kept as a thin wrapper, then
renamed when the frontend migration completes)
- `price_predictions` table — replaced by `weekly_forecasts` (ridge) +
`llm_overlays` (news layer)
- `OilPriceService::generatePrediction()`, EWMA/LLM helpers — replaced by
`LlmOverlayService` (Phase 8) which has a different contract
- `OilPriceService::fetchBrentPrices()` — kept and **expanded** in Phase 7
(backfill mode + daily refresh), not deleted
- `.claude/rules/scoring.md` retired in favour of a fresh
`.claude/rules/forecasting.md`
- `.claude/rules/prediction.md` rewritten to match the new architecture
---
## Open decisions (to confirm before Phase 1)
- **Forecast cadence** — the *forecast itself* is weekly (matches BEIS
publication). The *confidence and presentation* update daily via Layer 4
(LLM) and Layer 5 (volatility regime). This split is deliberate — we
refuse to fabricate intra-week movement, but we don't pretend a static
Monday number is reliable on Thursday after a 6% Brent move.
- **Scope** — drop the six-signal aggregator entirely, confirmed.
- **API shape** — keep existing JSON output keys so Vue keeps working,
with the engine swapped under the hood. The original `confidence_score`
field maps to `ridge_confidence` (calibrated, untouched). Add new
fields: `volatility` (`{active, trigger}`), `news_overlay`
(`{direction, agreement, events}`), and `verdict_reason` (which gate
fired, if any). The verdict itself goes in the existing `action` field.
- **Brent** — promoted to Phase 7 (was "optional, conditional"). Needed
for the volatility detector, regardless of whether it's used in the
ridge model.
- **LLM** — Anthropic Claude Haiku with web search. Single scheduled call
at 07:00 UK (after the 06:30 Brent refresh). Plus event-driven refreshes
when Layer 5 flips the volatility flag on, with a 4h cooldown. No fixed
afternoon cron — by 13:00 UK, morning users have already made their
fill-up decisions, so the value is too low to justify the extra noise.
Hard confidence cap 75. Empty-citation rejection.
---
## Changelog (substantive design decisions)
| When | Change | Why |
|---|---|---|
| 2026-05-01 v1 | Initial spec — three layers, six-signal aggregator removed, ridge model on BEIS weekly data | Replace incoherent `NationalFuelPredictionService` |
| 2026-05-01 v2 | Added Layer 4 (LLM news overlay) and Layer 5 (volatility regime detector). Pump prices can move daily during oil shocks; static weekly forecast must be backed by intra-week safety nets. | Iran/Hormuz-style shocks make a Monday-only confidence number stale by Wednesday |
| 2026-05-01 v3 | **Verdict via rule gates, not multipliers.** `ridge_confidence` displayed verbatim. LLM and volatility presented as badges. `weeks_since_duty_change` removed from features (kept as calibration override only — n=1 can't fit a coefficient). Backtest gate floor lowered 65 → 62 (realistic without Brent). Structural leak detection (per-feature timestamp check) made primary; accuracy>75% demoted to secondary smell test. `weekly_forecasts` PK changed to `(forecast_for, model_version)` to preserve audit on retrain. `forecast_outcomes` made three-class. Layer 5 station-churn trigger gated until ≥180 days of stable polling. | Multipliers obscure calibration. Gates compose cleanly and stay auditable. |
---
## References
- Alquist, Kilian, Vigfusson (2013) — *Forecasting the Price of Oil*
the academic basis for "no-change baseline beats most structural models
at <6m horizons" (which is why Phase 2 matters as a hard floor).
- BEIS *Weekly road fuel prices* CSV — the 435-week training set.
- `.claude/rules/scoring.md`, `.claude/rules/prediction.md` — the two
inconsistent rule files this spec replaces.

View File

@@ -12,6 +12,7 @@ use Illuminate\Support\Facades\Route;
// Public endpoints (no API key required) // Public endpoints (no API key required)
Route::post('/auth/register', [AuthController::class, 'register']); Route::post('/auth/register', [AuthController::class, 'register']);
Route::post('/auth/login', [AuthController::class, 'login']); Route::post('/auth/login', [AuthController::class, 'login']);
Route::get('/auth/me', [AuthController::class, 'me']);
Route::get('/fuel-types', function () { Route::get('/fuel-types', function () {
return Cache::remember('api:fuel-types', now()->addDay(), fn () => collect(FuelType::cases()) return Cache::remember('api:fuel-types', now()->addDay(), fn () => collect(FuelType::cases())
@@ -29,7 +30,6 @@ Route::middleware(['throttle:60,1', VerifyApiKey::class])->group(function (): vo
// Sanctum-authenticated endpoints // Sanctum-authenticated endpoints
Route::middleware('auth:sanctum')->group(function (): void { Route::middleware('auth:sanctum')->group(function (): void {
Route::get('/auth/me', [AuthController::class, 'me']);
Route::post('/auth/logout', [AuthController::class, 'logout']); Route::post('/auth/logout', [AuthController::class, 'logout']);
// User dashboard endpoints // User dashboard endpoints

View File

@@ -69,6 +69,41 @@ it('returns the authenticated user on /me', function () {
->assertJsonPath('email', $user->email); ->assertJsonPath('email', $user->email);
}); });
it('does not leak sensitive or internal user fields on /me', function () {
$user = User::factory()->create([
'is_admin' => true,
'stripe_id' => 'cus_secret',
'pm_type' => 'visa',
'pm_last_four' => '4242',
'postcode' => 'SW1A 1AA',
]);
$user->subscriptions()->create([
'type' => 'default',
'stripe_id' => 'sub_secret',
'stripe_status' => 'active',
'stripe_price' => 'price_plus_monthly',
'quantity' => 1,
]);
$response = $this->actingAs($user, 'sanctum')
->getJson('/api/auth/me')
->assertOk();
$payload = $response->json();
expect(array_keys($payload))->toEqualCanonicalizing([
'name',
'email',
'two_factor_confirmed_at',
'tier',
'subscription_cancelled',
'subscription_cadence',
'subscribed_at',
'subscription_expires_at',
]);
});
it('reports subscription_cancelled=false for a user with no subscription', function () { it('reports subscription_cancelled=false for a user with no subscription', function () {
$user = User::factory()->create(); $user = User::factory()->create();
@@ -215,6 +250,12 @@ it('logs out and revokes the token', function () {
expect($user->tokens()->count())->toBe(0); expect($user->tokens()->count())->toBe(0);
}); });
it('returns 401 on protected routes without a token', function () { it('returns null on /me when unauthenticated', function () {
$this->getJson('/api/auth/me')->assertUnauthorized(); $response = $this->getJson('/api/auth/me')->assertOk();
expect($response->getContent())->toBe('null');
});
it('returns 401 on protected routes without a token', function () {
$this->postJson('/api/auth/logout')->assertUnauthorized();
}); });

View File

@@ -38,11 +38,24 @@ it('fetches and stores brent prices from EIA', function (): void {
->and(BrentPrice::find('2026-04-02')->price_usd)->toBe('73.80'); ->and(BrentPrice::find('2026-04-02')->price_usd)->toBe('73.80');
}); });
it('throws when EIA returns a 500', function (): void { it('throws with HTTP status when EIA returns a 500', function (): void {
Http::fake(['*eia.gov/*' => Http::response([], 500)]); Http::fake(['*eia.gov/*' => Http::response([], 500)]);
expect(fn () => $this->fetcher->fetchFromEia())
->toThrow(BrentPriceFetchException::class, 'EIA returned HTTP 500');
});
it('retries EIA on transient 500 and succeeds', function (): void {
Http::fake([
'*eia.gov/*' => Http::sequence()
->push([], 500)
->push(['response' => ['data' => [['period' => '2026-04-01', 'value' => '75.10']]]]),
]);
$this->fetcher->fetchFromEia(); $this->fetcher->fetchFromEia();
})->throws(BrentPriceFetchException::class);
expect(BrentPrice::count())->toBe(1);
});
it('throws when EIA returns empty data', function (): void { it('throws when EIA returns empty data', function (): void {
Http::fake(['*eia.gov/*' => Http::response(['response' => ['data' => []]])]); Http::fake(['*eia.gov/*' => Http::response(['response' => ['data' => []]])]);
@@ -84,11 +97,24 @@ it('fetches and stores brent prices from FRED', function (): void {
expect(BrentPrice::count())->toBe(2); expect(BrentPrice::count())->toBe(2);
}); });
it('throws when FRED fails', function (): void { it('throws with HTTP status when FRED returns a 500', function (): void {
Http::fake(['*/fred/series/observations*' => Http::response([], 500)]); Http::fake(['*/fred/series/observations*' => Http::response([], 500)]);
expect(fn () => $this->fetcher->fetchFromFred())
->toThrow(BrentPriceFetchException::class, 'FRED returned HTTP 500');
});
it('retries FRED on transient 500 and succeeds', function (): void {
Http::fake([
'*/fred/series/observations*' => Http::sequence()
->push([], 500)
->push(['observations' => [['date' => '2026-04-01', 'value' => '75.10']]]),
]);
$this->fetcher->fetchFromFred(); $this->fetcher->fetchFromFred();
})->throws(BrentPriceFetchException::class);
expect(BrentPrice::count())->toBe(1);
});
it('filters out FRED missing value markers', function (): void { it('filters out FRED missing value markers', function (): void {
Http::fake([ Http::fake([