Free Tool

Inference Savings Calculator

A per-lever projection of how much your inference spend can drop. Same math we run for paid audits — the difference is precision, not approach.

~3 minutes · result on the next page · no follow-up if you don't want one

About you

Where should we send a copy?

The result appears on the next page; we'll also email a copy you can forward. No drip sequence.

Step 1

Your inference spend today

A rough number is fine. The projection scales linearly with spend, so an order-of-magnitude estimate gives an order-of-magnitude answer.

$

Sum across all providers: Anthropic, OpenAI, Bedrock, etc.

Count of distinct inference vendors you call.

Step 2

Your workload mix

Roughly what fraction of your inference falls into each shape? Whole numbers; should sum to about 100.

%

Chat, in-product AI features.

%

Notifications, lower-priority enrichment.

%

Evals, embeddings, bulk generation.

Mix sums to 100%
Step 3

Where the savings might be

Rough fractions. You can leave the defaults if unsure — the projection sensitivity will be on the headline number, not the per-lever ordering.

%

Long system prompts, RAG context, few-shot examples.

%

Could run on a smaller tier (Sonnet → Haiku, 4o → 4o-mini).

%

Workloads where Llama / Qwen / DeepSeek would meet your quality bar.

%

Predictable high-QPS that would benefit from provisioned throughput.

Step 4

What's already in place

Levers you've already pulled drop out of the projection automatically.

Step 5 · optional

Self-hosting

Only economically meaningful above ~$50k/mo on a stable workload. Leave at 0 if not on the table.

%
Optional

Anything else?

Constraints we should know about — compliance, latency SLA, specific provider lock-in, anything you'd want flagged in the result.

Result appears on the next page. We'll email a copy you can forward.