fix(forecasting): persist LLM overlay under Tier-1 ITPM via two-call architecture
The daily forecast:llm-overlay command was being skipped because the previous single-conversation flow consumed more than Tier-1's 50,000 input-tokens-per- minute Anthropic bucket. The web_search tool auto-caches its results (~55k tokens) and requires `encrypted_content` intact when those blocks are resent, so the prior retry-on-missing-citations path either 429'd or 400'd on the second call. LlmOverlayService now runs two independent API calls. Phase 1 invokes the web_search tool and we discard the transcript after harvesting the URLs + titles from the returned web_search_tool_result blocks. Phase 2 is a fresh conversation containing the forecast context and the harvested headlines as plain text, with a forced submit_overlay tool call. events_cited is now optional in the tool schema — Haiku's flaky compliance no longer matters because citations come from the search results, not the model's transcription. Model-tagged events (with directional impact) merge with harvested-only entries (impact: 'neutral'), deduped by URL. Between phases the service reads anthropic-ratelimit-input-tokens-remaining / …-reset from Phase 1's headers and sleeps proportionally — only long enough for the SUBMIT_TOKEN_BUDGET worth of refill, not for the full bucket reset, capped at 65 seconds. ApiLogger now captures usage.input_tokens, usage.output_tokens, cache_read_input_tokens, cache_creation_input_tokens, plus the rate-limit remaining/reset headers on every Anthropic response. New nullable columns on api_logs make rate-limit diagnostics directly queryable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,46 @@
|
||||
<?php
|
||||
|
||||
use Illuminate\Database\Migrations\Migration;
|
||||
use Illuminate\Database\Schema\Blueprint;
|
||||
use Illuminate\Support\Facades\Schema;
|
||||
|
||||
return new class extends Migration
|
||||
{
|
||||
/**
|
||||
* Capture token usage and rate-limit headers from token-metering
|
||||
* providers (today: Anthropic). These columns let us see the
|
||||
* cumulative input-tokens-per-minute trajectory directly in
|
||||
* api_logs rather than inferring it from request counts.
|
||||
*/
|
||||
public function up(): void
|
||||
{
|
||||
Schema::table('api_logs', function (Blueprint $table) {
|
||||
$table->unsignedInteger('input_tokens')->nullable()->after('response_body')
|
||||
->comment('Input tokens billed (Anthropic usage.input_tokens). NULL for providers that do not report usage.');
|
||||
$table->unsignedInteger('output_tokens')->nullable()->after('input_tokens')
|
||||
->comment('Output tokens billed (Anthropic usage.output_tokens).');
|
||||
$table->unsignedInteger('cache_read_tokens')->nullable()->after('output_tokens')
|
||||
->comment('Cache-hit tokens (Anthropic usage.cache_read_input_tokens). Do not count toward ITPM on most models.');
|
||||
$table->unsignedInteger('cache_write_tokens')->nullable()->after('cache_read_tokens')
|
||||
->comment('Cache-write tokens (Anthropic usage.cache_creation_input_tokens). Count toward ITPM.');
|
||||
$table->unsignedInteger('ratelimit_remaining')->nullable()->after('cache_write_tokens')
|
||||
->comment('Provider-reported input-tokens remaining in the rolling window (anthropic-ratelimit-input-tokens-remaining).');
|
||||
$table->dateTime('ratelimit_reset_at')->nullable()->after('ratelimit_remaining')
|
||||
->comment('When the input-tokens bucket will be fully replenished (anthropic-ratelimit-input-tokens-reset, RFC 3339).');
|
||||
});
|
||||
}
|
||||
|
||||
public function down(): void
|
||||
{
|
||||
Schema::table('api_logs', function (Blueprint $table) {
|
||||
$table->dropColumn([
|
||||
'input_tokens',
|
||||
'output_tokens',
|
||||
'cache_read_tokens',
|
||||
'cache_write_tokens',
|
||||
'ratelimit_remaining',
|
||||
'ratelimit_reset_at',
|
||||
]);
|
||||
});
|
||||
}
|
||||
};
|
||||
Reference in New Issue
Block a user