Skip to main content

What is Costrace?

Costrace is an LLM observability SDK that automatically tracks cost, token usage, and latency for every API call you make to OpenAI, Anthropic, or Google Gemini. No code changes required, just call init() and use your LLM SDKs as normal.

Key Features

Zero Configuratio.mn

One line to initialize. Works by monkey-patching your existing LLM client libraries.

Real-time Cost Tracking

See exactly how much each API call costs, broken down by model and provider.

Latency Monitoring

Track response times to identify slow endpoints and optimize performance.

Multi-Provider Support

OpenAI, Anthropic, and Gemini all tracked automatically.

How It Works

Costrace works by patching the client libraries for OpenAI, Anthropic, and Gemini. When you make an API call, Costrace:
  1. Records the start time
  2. Lets your call proceed normally
  3. Captures token usage and response time
  4. Calculates the cost based on current pricing
  5. Sends a trace to the Costrace backend (fire-and-forget, non-blocking)
All of this happens transparently! Your code doesn’t change.

Supported Providers

  • GPT-5 family (5.2, 5, 5-mini, 5-nano) - GPT-4 family (4o, 4o-mini, 4.1, 4-turbo, 4) - GPT-3.5-turbo - o3, o4-mini
  • Claude Opus (4-6, 4-1, 4) - Claude Sonnet (4-6, 4-5, 4, 3-7, 3.5) - Claude Haiku (4-5, 3)
  • Gemini 2.0 (flash, flash-lite) - Gemini 1.5 (pro, flash, flash-8b)

Next Steps