AI Cost Optimizer
Rank every cost-reduction opportunity for your LLM workload — Prompt Cache, Batch API, and model switching — by annual savings. See your max saving in seconds.
Inputs
The model you are using in production today.
Average prompt size per API call — system prompt + context + user message. Typical: simple chat 500–2,000 · RAG chatbot 2,000–8,000 · document analysis 10,000–100,000 tokens.
Average response length. Short answer: 200–500 · paragraph: 500–1,000 · long-form: 1,000–4,000 tokens.
Total API calls per month. 1,000 DAU × 5 calls/day × 30 days = 150,000 requests/month.
Fraction of input tokens currently served from Prompt Cache. 0 = not using cache.
Saves $9,595.50/yr (98.4%) with high engineering effort. See the ranked table below for all options.
Comparison
| Option | Monthly | Yearly |
|---|---|---|
| Best combo: GPT-5 Nano + cache + batchMaximum possible savings: optimal model, 60% cache, batch API enabled.cheapest | $12.88 | $154.50 |
| Switch to GPT-5 NanoRun quality eval A/B test vs. current model. Most tasks: 85–95% quality retention. | $25.75 | $309.00 |
| Switch to GPT-5 MiniRun quality eval A/B test vs. current model. Most tasks: 85–95% quality retention. | $128.75 | $1,545.00 |
| Cache (60%) + Batch APIMaximum savings without changing models. | $321.88 | $3,862.50 |
| Switch to Claude Haiku 4.5Run quality eval A/B test vs. current model. Most tasks: 85–95% quality retention. | $365.00 | $4,380.00 |
| Enable Batch API (50% discount)Use async batch endpoint. Suitable for offline / non-realtime workloads. | $406.25 | $4,875.00 |
| Enable Prompt Cache (60% hit)Fix system prompt at context start and reuse across requests. | $643.75 | $7,725.00 |
| Current baselineGPT-5, cache 0%, batch offcurrent | $812.50 | $9,750.00 |
Pricing sources
Last verified 2026-06-30 · openai.com/api/pricing openai.com/api/pricing · platform.claude.com/docs/about-claude/pricing platform.claude.com/docs/about-claude/pricing · ai.google.dev/gemini-api/docs/pricing ai.google.dev/gemini-api/docs/pricing
Trends & comparison
Trend
Comparison (monthly vs. yearly)
The three LLM cost levers — ranked by effort vs. impact
Prompt Cache is the highest-ROI, lowest-effort change: one line of code to fix your system prompt position, and input costs drop 90% for cached tokens. Batch API requires minimal code changes for async workloads and saves 50% across all models. Model switching has the highest absolute saving potential but requires quality evaluation — budget time for evals before deploying in production.
Why LLM costs compound faster than expected
Unlike fixed infrastructure, LLM costs scale with every user interaction. A product with 1,000 DAU each making 5 API calls per day is 150,000 requests/month. At GPT-5 pricing without optimisation, that can easily reach $15,000–$50,000/month. The same workload with Prompt Cache, Batch API, and a model switch to GPT-5 Mini can cost under $2,000/month.
Frequently asked questions
How do I reduce my OpenAI API costs?▾
Three main levers: (1) Prompt Cache — fix your system prompt at context start, reuse it across requests, and input costs drop 90% on cached tokens. (2) Batch API — for async workloads (reports, data processing), it gives 50% off all models. (3) Model downgrade — GPT-5 Mini is 80% cheaper than GPT-5 with comparable quality for most tasks. This calculator runs all three against your workload and ranks them by annual savings.
How much does OpenAI Prompt Cache actually save?▾
For GPT-5, cached input tokens cost $0.125/MTok instead of $1.25/MTok — a 90% reduction. With a 60% cache hit ratio, a workload costing $2,000/month drops to about $920/month, saving $12,960/year. The saving scales linearly with your input token volume and cache hit ratio.
GPT-5 vs GPT-5 Mini — which is right for my use case?▾
GPT-5 Mini ($0.25/$2.00 per MTok) handles 85–95% of product-level tasks at 80% lower cost. Rule of thumb: use GPT-5 for complex multi-step reasoning, code generation, and analysis where quality is business-critical. Use GPT-5 Mini for customer-facing chat, RAG pipelines, summarisation, and content generation. Always run evals on a representative sample before switching.
Can I combine Prompt Cache and Batch API?▾
Yes, and they stack multiplicatively. Prompt Cache reduces input costs by up to 90%, then Batch API applies a 50% discount on the remaining cost. A workload costing $5,000/month could drop to under $250/month with both enabled — a 95%+ reduction.