AI Cost Optimizer

Rank every cost-reduction opportunity for your LLM workload — Prompt Cache, Batch API, and model switching — by annual savings. See your max saving in seconds.

Inputs

Current model

The model you are using in production today.

Input tokens / request

Average prompt size per API call — system prompt + context + user message. Typical: simple chat 500–2,000 · RAG chatbot 2,000–8,000 · document analysis 10,000–100,000 tokens.

Output tokens / request

Average response length. Short answer: 200–500 · paragraph: 500–1,000 · long-form: 1,000–4,000 tokens.

Monthly requests

Total API calls per month. 1,000 DAU × 5 calls/day × 30 days = 150,000 requests/month.

Current cache hit ratio: 0.00

Fraction of input tokens currently served from Prompt Cache. 0 = not using cache.

Batch API already enabled(Check if you are already using the async Batch API endpoint.)

Max potential savings

$9,595.50/year

Best combo: GPT-5 Nano + cache + batch

Best combo: GPT-5 Nano + cache + batch saves $9,595.50/yr with high implementation effort.

$9,595.50

saved / year (98.4%)

Top opportunity: Best combo: GPT-5 Nano + cache + batchHigh effort

Saves $9,595.50/yr (98.4%) with high engineering effort. See the ranked table below for all options.

$9,595.50

saved / year

Current yearly cost

$9,750.00

Current monthly cost

$812.50

Best optimised cost

$12.88/month

Top action

Best combo: GPT-5 Nano + cache + batch

Comparison

Option	Monthly	Yearly
Best combo: GPT-5 Nano + cache + batchMaximum possible savings: optimal model, 60% cache, batch API enabled.cheapest	$12.88	$154.50
Switch to GPT-5 NanoRun quality eval A/B test vs. current model. Most tasks: 85–95% quality retention.	$25.75	$309.00
Switch to GPT-5 MiniRun quality eval A/B test vs. current model. Most tasks: 85–95% quality retention.	$128.75	$1,545.00
Cache (60%) + Batch APIMaximum savings without changing models.	$321.88	$3,862.50
Switch to Claude Haiku 4.5Run quality eval A/B test vs. current model. Most tasks: 85–95% quality retention.	$365.00	$4,380.00
Enable Batch API (50% discount)Use async batch endpoint. Suitable for offline / non-realtime workloads.	$406.25	$4,875.00
Enable Prompt Cache (60% hit)Fix system prompt at context start and reuse across requests.	$643.75	$7,725.00
Current baselineGPT-5, cache 0%, batch offcurrent	$812.50	$9,750.00

Pricing sources

Last verified 2026-06-30 · openai.com/api/pricing openai.com/api/pricing · platform.claude.com/docs/about-claude/pricing platform.claude.com/docs/about-claude/pricing · ai.google.dev/gemini-api/docs/pricing ai.google.dev/gemini-api/docs/pricing

Continue your analysis

Model migration ROI

Calculate engineering cost vs. yearly savings payback.

Token budget planning

Find how many tokens your monthly budget allows.

Trends & comparison

Trend

Comparison (monthly vs. yearly)

The three LLM cost levers — ranked by effort vs. impact

Prompt Cache is the highest-ROI, lowest-effort change: one line of code to fix your system prompt position, and input costs drop 90% for cached tokens. Batch API requires minimal code changes for async workloads and saves 50% across all models. Model switching has the highest absolute saving potential but requires quality evaluation — budget time for evals before deploying in production.

Why LLM costs compound faster than expected

Unlike fixed infrastructure, LLM costs scale with every user interaction. A product with 1,000 DAU each making 5 API calls per day is 150,000 requests/month. At GPT-5 pricing without optimisation, that can easily reach $15,000–$50,000/month. The same workload with Prompt Cache, Batch API, and a model switch to GPT-5 Mini can cost under $2,000/month.

Frequently asked questions

How do I reduce my OpenAI API costs?▾

Three main levers: (1) Prompt Cache — fix your system prompt at context start, reuse it across requests, and input costs drop 90% on cached tokens. (2) Batch API — for async workloads (reports, data processing), it gives 50% off all models. (3) Model downgrade — GPT-5 Mini is 80% cheaper than GPT-5 with comparable quality for most tasks. This calculator runs all three against your workload and ranks them by annual savings.

How much does OpenAI Prompt Cache actually save?▾

For GPT-5, cached input tokens cost $0.125/MTok instead of $1.25/MTok — a 90% reduction. With a 60% cache hit ratio, a workload costing $2,000/month drops to about $920/month, saving $12,960/year. The saving scales linearly with your input token volume and cache hit ratio.

GPT-5 vs GPT-5 Mini — which is right for my use case?▾

GPT-5 Mini ($0.25/$2.00 per MTok) handles 85–95% of product-level tasks at 80% lower cost. Rule of thumb: use GPT-5 for complex multi-step reasoning, code generation, and analysis where quality is business-critical. Use GPT-5 Mini for customer-facing chat, RAG pipelines, summarisation, and content generation. Always run evals on a representative sample before switching.

Can I combine Prompt Cache and Batch API?▾

Yes, and they stack multiplicatively. Prompt Cache reduces input costs by up to 90%, then Batch API applies a 50% discount on the remaining cost. A workload costing $5,000/month could drop to under $250/month with both enabled — a 95%+ reduction.