A per-lever projection of how much your inference spend can drop. Same math we run for paid audits — the difference is precision, not approach.
~3 minutes · result on the next page · no follow-up if you don't want one
The result appears on the next page; we'll also email a copy you can forward. No drip sequence.
A rough number is fine. The projection scales linearly with spend, so an order-of-magnitude estimate gives an order-of-magnitude answer.
Sum across all providers: Anthropic, OpenAI, Bedrock, etc.
Count of distinct inference vendors you call.
Roughly what fraction of your inference falls into each shape? Whole numbers; should sum to about 100.
Chat, in-product AI features.
Notifications, lower-priority enrichment.
Evals, embeddings, bulk generation.
Rough fractions. You can leave the defaults if unsure — the projection sensitivity will be on the headline number, not the per-lever ordering.
Long system prompts, RAG context, few-shot examples.
Could run on a smaller tier (Sonnet → Haiku, 4o → 4o-mini).
Workloads where Llama / Qwen / DeepSeek would meet your quality bar.
Predictable high-QPS that would benefit from provisioned throughput.
Levers you've already pulled drop out of the projection automatically.
Only economically meaningful above ~$50k/mo on a stable workload. Leave at 0 if not on the table.
Constraints we should know about — compliance, latency SLA, specific provider lock-in, anything you'd want flagged in the result.
Result appears on the next page. We'll email a copy you can forward.