The daily forecast:llm-overlay command was being skipped because the previous
single-conversation flow consumed more than Tier-1's 50,000 input-tokens-per-
minute Anthropic bucket. The web_search tool auto-caches its results (~55k
tokens) and requires `encrypted_content` intact when those blocks are resent,
so the prior retry-on-missing-citations path either 429'd or 400'd on the
second call.
LlmOverlayService now runs two independent API calls. Phase 1 invokes the
web_search tool and we discard the transcript after harvesting the URLs +
titles from the returned web_search_tool_result blocks. Phase 2 is a fresh
conversation containing the forecast context and the harvested headlines as
plain text, with a forced submit_overlay tool call. events_cited is now
optional in the tool schema — Haiku's flaky compliance no longer matters
because citations come from the search results, not the model's transcription.
Model-tagged events (with directional impact) merge with harvested-only
entries (impact: 'neutral'), deduped by URL.
Between phases the service reads anthropic-ratelimit-input-tokens-remaining /
…-reset from Phase 1's headers and sleeps proportionally — only long enough
for the SUBMIT_TOKEN_BUDGET worth of refill, not for the full bucket reset,
capped at 65 seconds.
ApiLogger now captures usage.input_tokens, usage.output_tokens,
cache_read_input_tokens, cache_creation_input_tokens, plus the rate-limit
remaining/reset headers on every Anthropic response. New nullable columns on
api_logs make rate-limit diagnostics directly queryable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Hero: remove full-width mobile submit, add inline "Go" button next to locate
- Prediction cards: tighter mobile padding (px-3 py-3)
- Search filters: right-aligned toolbar, remove "X stations found" count and map toggle
- Map: initialize view immediately to avoid tile wiggle, skip recenter on fresh init
- Station list: hidden by default, toggled via "Stations {count}" pill above map
- Typography: hide desktop h1 on mobile, scale down section headings and spacing
- Footer: remove uppercase styling from headings and copyright line
- Filter popover: auto-close on fuel/radius/sort/brand selection
fix(llm): retry submit_overlay when events_cited is missing, extend Fuel Finder timeout with retries
- LlmOverlayService: add `minItems: 1` to events_cited schema, detect missing citations
in submit response, inject tool_result error and retry once with explicit prompt
- Log full raw_result context when no verified citations, capturing direction/confidence/reasoning
- FuelPriceService: add 3×1s retry with 60s timeout to batch price requests (was 30s no retry)
- Tests: cover successful retry recovery and rejection when retry also omits citations
ApiLogger now stores the upstream response body to
`api_logs.response_body` whenever the call failed (non-2xx response or
a RequestException carrying a response). Successful 2xx responses
remain null so the table stays small on busy services like fuel:poll
and oil:fetch.
Truncated at 64 KB. The column is mediumText so a future cap raise
needs no schema change.
Captures:
- 4xx and 5xx response bodies verbatim
- Body extracted from RequestException via `$e->response->body()`
when callers use `Http::throw()`
Does not capture:
- ConnectionException (no response existed)
- Generic Throwable from the closure (same reason)
Motivation: the LLM overlay's "skipped — no verified citations" path
left no forensic trail to debug. With this, the next time anything
routed through ApiLogger fails — Anthropic 429s, FRED 5xx, Fuel
Finder errors — the failed body is queryable directly:
SELECT response_body FROM api_logs
WHERE service = ? AND status_code >= 400
ORDER BY id DESC LIMIT 1;
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes everything that was made redundant by the new forecasting
stack. Per docs/superpowers/specs/2026-05-01-prediction-rebuild-design.md,
this was the cleanup planned at the end of Phase 4.
Deleted services and code:
- App\Services\Prediction\Signals\* (the old six-signal aggregator —
trend, supermarket, day-of-week, brand-behaviour, stickiness,
regional-momentum, oil — replaced by RidgeRegressionModel).
- App\Services\NationalFuelPredictionService (the post-Phase-4 thin
shim; StationSearchService now depends on WeeklyForecastService
directly, set up in the previous commit).
- App\Services\LlmPrediction\* (AbstractLlmPredictionProvider plus
the four provider implementations — Anthropic, OpenAI, Gemini, and
the OilPredictionProvider router. Replaced by LlmOverlayService).
- App\Services\BrentPricePredictor and App\Services\Ewma. The Ewma
helper had no callers left after BrentPricePredictor went.
- App\Models\PricePrediction and its factory.
- App\Console\Commands\PredictOilPrices (the oil:predict command).
- App\Filament\Resources\OilPredictionResource and its Pages.
Schema and dashboard:
- Drop the price_predictions table via a new migration.
- Repoint the Filament StatsOverviewWidget tile from PricePrediction
to WeeklyForecast so the dashboard reflects the new pipeline.
- Remove the OilPredictionProvider binding from AppServiceProvider.
Test cleanup:
- Delete tests for every retired service.
- Update StatsOverviewWidgetTest to seed weekly_forecasts instead of
price_predictions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the implementation behind NationalFuelPredictionService — the
public JSON contract on /api/stations is preserved, but the engine is
new and honest.
Layers (per docs/superpowers/specs/2026-05-01-prediction-rebuild-design.md):
1. Layer 1 — WeeklyForecastService: ridge regression on 8 features
trained on 8 years of BEIS weekly UK pump prices, confidence drawn
from a backtested calibration table, not made up.
2. Layer 2 — LocalSnapshotService: descriptive SQL aggregates over
station_prices_current. Never speaks about the future.
3. Layer 3 — verdict via rule gates, not confidence multipliers. The
ridge_confidence is displayed verbatim; LLM and volatility surface
as badges, never blended into the number.
4. Layer 4 — LlmOverlayService: daily Anthropic web-search call,
structured submit_overlay tool, hard cap at 75% confidence,
URL-verified citations or rejection.
5. Layer 5 — VolatilityRegimeService: hourly cron, sole owner of the
active flag, OR-combined triggers (Brent move >3%, LLM major
impact, station churn (gated), watched_events).
Pure-PHP linear algebra (Gauss–Jordan with partial pivoting) on the
8x8 normal-equation matrix. No external ML dependency. Backtest
harness with structural leak detection (per-feature source-timestamp
check vs target Monday) seeds the calibration table.
Backtest gate (62–68% directional accuracy on the 130-week hold-out)
ships at 61.98% with MAE 0.48 p/L — beats the naive zero-change
baseline by ~30pp on real data.
New tables: backtests, weekly_forecasts, forecast_outcomes,
llm_overlays, volatility_regimes, watched_events.
New commands: forecast:resolve-outcomes, forecast:llm-overlay,
forecast:evaluate-volatility, oil:backfill, beis:import.
Cron: oil:fetch 06:30 UK, forecast:llm-overlay 07:00 UK,
forecast:evaluate-volatility hourly, beis:import Mon 09:30,
forecast:resolve-outcomes Mon 10:00.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Made `/api/auth/me` public and return explicit allowlist (name, email,
two_factor_confirmed_at, tier, subscription fields) instead of spreading
`$user->toArray()` which leaked is_admin, stripe_id, pm_type, pm_last_four,
postcode. Returns `null` when unauthenticated rather than 401.
- Moved `/auth/logout` to remain behind auth:sanctum gate.
- Added 3×200ms retry with exponential backoff to EiaBrentPriceSource and
FredBrentPriceSource on ConnectionException or 5xx responses. Timeout
raised from 10s to 30s.
- Both sources now throw typed BrentPriceFetchException on exhausted retries
instead of silently returning null + logging. Updated tests to assert
exception message includes HTTP status or "connection failed".
Audit items #7 and #5.
#7 — BrentPricePredictor::generatePrediction previously wrote both an
EWMA row and an LLM row to price_predictions on every run. The
downstream OilSignal already prefers llm_with_context > llm > ewma, so
the EWMA row was dead weight 95% of the time. Now we try LLM first; if
it returns null (no API key, parse failure, etc.) we compute and persist
EWMA as a real fallback. This also avoids redundant work on the success
path.
Updated the "stores both" test to "stores only LLM" — asserts no EWMA
row is written when the provider succeeds.
#5 — BrentPricePredictor and AnthropicPredictionProvider both had
byte-identical computeEwma() methods with identical EWMA_ALPHA = 0.3
constants. Extracted to App\Services\Ewma::compute() and dropped both
private methods + their alpha constants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate prediction functionality by merging /api/prediction endpoint into /api/stations response. Move prediction logic from PredictionController into StationController, returning prediction data alongside station results. Replace usePrediction composable with unified useStations that returns {stations, meta, prediction}. Remove PredictionRequest, related tests, and unused Vue components (FuelFinderTest, MapTest, RecommendationTest, StationListTest). Add PredictionFull component and UpsellBanner. Extend NationalFuelPredictionService to include weekly_summary (7-day series, yesterday/today averages, cheapest/priciest days) and oil signal from price_predictions table. Update Home.vue to consume prediction from stations response. Add Plan::resolveCadenceForUser helper and configure Cashier to use custom Subscription model.
Reconciles tier docs with `PlanSeeder` reality (basic has price_threshold
and score_alerts; schema is stripe_price_id_monthly + stripe_price_id_annual)
and introduces the display-name layer from pricing-plan.md v2.
- PlanTier::label() + Plan::displayName() + PlanFeatures::displayName()
expose user-facing names (Free/Daily/Smart/Pro); backend identifiers stay
basic/plus/pro so every call site, Stripe mapping, and test keeps working.
- push.frequency key added to features JSON (none/daily/triggered), mirroring
email.frequency so Daily's daily push is distinguishable from Smart/Pro's
triggered push. Seeder, factory, free-tier stubs, and Filament form updated.
- Homepage pricing cards renamed: Basic→Daily, Plus→Smart; badge
"Most Popular"→"Most pick this"; CTAs refreshed.
- docs/tiers.md change log records the full diff.
Fleet tier, 14-day trial copy, and Smart dark-card treatment deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Delete unused Livewire Search test and fuel type select Blade component
- Move subscription webhook listener from EventServiceProvider to AppServiceProvider
- Add FUEL_TYPES global config to app layout for client-side use
- Add Billable trait to User model and include email_verified_at in fillable
- Implement monthly/annual cadence toggle with pricing display and smart CTA routing on homepage
- Update VerifyApiKeyMiddlewareTest to use e10 instead of petrol
- Refactor PollFuelPrices to auto-refresh stale stations based on last_seen_at
- Add incremental polling with cached timestamp and effective-start-timestamp param to FuelPriceService
- Normalize amenities/fuel_types from API objects to flat arrays, skip stations missing required fields
- Log response body on API failures in ApiLogger
- Default homepage sort to 'reliable' instead of 'price'
- BrentPriceFetcher owns ingestion (fetchFromEia / fetchFromFred, each throws on failure)
- BrentPricePredictor owns prediction and marks latest brent_prices row as generated
- oil:fetch command tries EIA, falls back to FRED, fails loudly if both fail
- oil:predict command prompts if latest price already has a prediction; --force bypasses
- add prediction_generated_at column to brent_prices
- delete OilPriceService (replaced by the two focused services)
OilPriceService no longer inlines per-provider fetch/transform/error logic.
EIA and FRED are now their own classes with a common shape; the service
just iterates and upserts the first successful result.
Extends NearbyStationsRequest to accept `postcode` (full or outcode) as an alternative to lat/lng. PostcodeService resolves it via postcodes.io and falls through to coordinates. Also adds SearchResource to the Filament admin panel for viewing logged search activity with fuel type filter and price/distance stats columns. Includes SQLite GREATEST/LEAST function polyfills in AppServiceProvider for test compatibility.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
User management resource with editable is_admin field, postcode support,
admin filter, and inline delete action. Includes list and edit pages.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New method uses web_search_20260209 server-side tool so Claude fetches
48h of oil/geopolitical news autonomously before predicting direction
- Prompt uses raw prices only — no pre-computed EWMA indicators
- pause_turn loop handles server-side search continuation (up to 5 iters)
- generatePrediction() now tries context method first, falls back to
generateLlmPrediction(), then EWMA
- Default model updated to claude-sonnet-4-6
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add PostcodeService to resolve UK postcodes, outcodes, and place names to coordinates via postcodes.io API with 30-day caching
- Add LocationResult value object for resolved location data
- Add per-fuel-type price validation (80p-1050p range) to FuelPriceService with warning logs for out-of-range prices
- Change price_pence column from unsignedSmallInteger to unsignedMediumInteger in station_prices tables
- Add CHECK constraints (5000-50000 range) on price_pence columns as database-level guard
- Improve error handling in PollFuelPrices command with file/line/trace output
- Add tests for PostcodeService covering full postcodes, outcodes, place names, caching, and error handling
- Add test for price validation range checks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Also extend Pest TestCase to Unit tests and guard MySQL-only migration
DDL (composite PK + PARTITION BY) behind a driver check so in-memory
SQLite tests can run migrations cleanly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>