fix(forecasting): persist LLM overlay under Tier-1 ITPM via two-call architecture

The daily forecast:llm-overlay command was being skipped because the previous single-conversation flow consumed more than Tier-1's 50,000 input-tokens-per- minute Anthropic bucket. The web_search tool auto-caches its results (~55k tokens) and requires `encrypted_content` intact when those blocks are resent, so the prior retry-on-missing-citations path either 429'd or 400'd on the second call. LlmOverlayService now runs two independent API calls. Phase 1 invokes the web_search tool and we discard the transcript after harvesting the URLs + titles from the returned web_search_tool_result blocks. Phase 2 is a fresh conversation containing the forecast context and the harvested headlines as plain text, with a forced submit_overlay tool call. events_cited is now optional in the tool schema — Haiku's flaky compliance no longer matters because citations come from the search results, not the model's transcription. Model-tagged events (with directional impact) merge with harvested-only entries (impact: 'neutral'), deduped by URL. Between phases the service reads anthropic-ratelimit-input-tokens-remaining / …-reset from Phase 1's headers and sleeps proportionally — only long enough for the SUBMIT_TOKEN_BUDGET worth of refill, not for the full bucket reset, capped at 65 seconds. ApiLogger now captures usage.input_tokens, usage.output_tokens, cache_read_input_tokens, cache_creation_input_tokens, plus the rate-limit remaining/reset headers on every Anthropic response. New nullable columns on api_logs make rate-limit diagnostics directly queryable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:22:42 +01:00
parent 97e27fc057
commit 07e0789044
6 changed files with 668 additions and 325 deletions
--- a/app/Models/ApiLog.php
+++ b/app/Models/ApiLog.php
@@ -7,7 +7,21 @@ use Illuminate\Database\Eloquent\Attributes\Fillable;
 use Illuminate\Database\Eloquent\Factories\HasFactory;
 use Illuminate\Database\Eloquent\Model;

-#[Fillable(['service', 'method', 'url', 'status_code', 'duration_ms', 'error', 'response_body'])]
+#[Fillable([
+    'service',
+    'method',
+    'url',
+    'status_code',
+    'duration_ms',
+    'error',
+    'response_body',
+    'input_tokens',
+    'output_tokens',
+    'cache_read_tokens',
+    'cache_write_tokens',
+    'ratelimit_remaining',
+    'ratelimit_reset_at',
+])]
 class ApiLog extends Model
 {
    /** @use HasFactory<ApiLogFactory> */
@@ -19,6 +33,7 @@ class ApiLog extends Model
    {
        return [
            'created_at' => 'datetime',
+            'ratelimit_reset_at' => 'datetime',
        ];
    }
 }