fix(forecasting): persist LLM overlay under Tier-1 ITPM via two-call architecture

The daily forecast:llm-overlay command was being skipped because the previous single-conversation flow consumed more than Tier-1's 50,000 input-tokens-per- minute Anthropic bucket. The web_search tool auto-caches its results (~55k tokens) and requires `encrypted_content` intact when those blocks are resent, so the prior retry-on-missing-citations path either 429'd or 400'd on the second call. LlmOverlayService now runs two independent API calls. Phase 1 invokes the web_search tool and we discard the transcript after harvesting the URLs + titles from the returned web_search_tool_result blocks. Phase 2 is a fresh conversation containing the forecast context and the harvested headlines as plain text, with a forced submit_overlay tool call. events_cited is now optional in the tool schema — Haiku's flaky compliance no longer matters because citations come from the search results, not the model's transcription. Model-tagged events (with directional impact) merge with harvested-only entries (impact: 'neutral'), deduped by URL. Between phases the service reads anthropic-ratelimit-input-tokens-remaining / …-reset from Phase 1's headers and sleeps proportionally — only long enough for the SUBMIT_TOKEN_BUDGET worth of refill, not for the full bucket reset, capped at 65 seconds. ApiLogger now captures usage.input_tokens, usage.output_tokens, cache_read_input_tokens, cache_creation_input_tokens, plus the rate-limit remaining/reset headers on every Anthropic response. New nullable columns on api_logs make rate-limit diagnostics directly queryable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:22:42 +01:00
parent 97e27fc057
commit 07e0789044
6 changed files with 668 additions and 325 deletions
--- a/tests/Unit/ApiLoggerTest.php
+++ b/tests/Unit/ApiLoggerTest.php
@@ -119,3 +119,57 @@ it('captures response_body when an HTTP RequestException is thrown', function ()

    expect(ApiLog::first()->response_body)->toBe('upstream details');
 });
+
+it('captures Anthropic usage tokens from a successful response', function (): void {
+    Http::fake(['https://api.anthropic.com/v1/messages' => Http::response([
+        'content' => [],
+        'usage' => [
+            'input_tokens' => 1234,
+            'output_tokens' => 56,
+            'cache_creation_input_tokens' => 8000,
+            'cache_read_input_tokens' => 12000,
+        ],
+    ])]);
+
+    $this->apiLogger->send('anthropic', 'POST', 'https://api.anthropic.com/v1/messages',
+        fn () => Http::post('https://api.anthropic.com/v1/messages'));
+
+    $log = ApiLog::first();
+    expect($log->input_tokens)->toBe(1234)
+        ->and($log->output_tokens)->toBe(56)
+        ->and($log->cache_write_tokens)->toBe(8000)
+        ->and($log->cache_read_tokens)->toBe(12000);
+});
+
+it('captures rate-limit headers from any provider response', function (): void {
+    Http::fake(['https://api.anthropic.com/v1/messages' => Http::response(
+        ['content' => [], 'usage' => ['input_tokens' => 100, 'output_tokens' => 10]],
+        200,
+        [
+            'anthropic-ratelimit-input-tokens-remaining' => '38000',
+            'anthropic-ratelimit-input-tokens-reset' => '2026-05-14T12:41:00Z',
+        ],
+    )]);
+
+    $this->apiLogger->send('anthropic', 'POST', 'https://api.anthropic.com/v1/messages',
+        fn () => Http::post('https://api.anthropic.com/v1/messages'));
+
+    $log = ApiLog::first();
+    expect($log->ratelimit_remaining)->toBe(38000)
+        ->and($log->ratelimit_reset_at?->toIso8601String())->toBe('2026-05-14T12:41:00+00:00');
+});
+
+it('leaves token columns null for services without usage data', function (): void {
+    Http::fake(['https://example.com/x' => Http::response(['ok' => true])]);
+
+    $this->apiLogger->send('test_service', 'GET', 'https://example.com/x',
+        fn () => Http::get('https://example.com/x'));
+
+    $log = ApiLog::first();
+    expect($log->input_tokens)->toBeNull()
+        ->and($log->output_tokens)->toBeNull()
+        ->and($log->cache_read_tokens)->toBeNull()
+        ->and($log->cache_write_tokens)->toBeNull()
+        ->and($log->ratelimit_remaining)->toBeNull()
+        ->and($log->ratelimit_reset_at)->toBeNull();
+});