OpenAI Cost Calculator
Estimate OpenAI GPT-5 API spend per call, day, month and year — then compare against Claude and Gemini and see how much switching could save.
Inputs
Choose the model tier you plan to use in production.
Average prompt size per API call — system prompt + context + user message. Typical ranges: simple chat 500–2,000 · RAG chatbot 2,000–8,000 · document analysis 10,000–100,000. A page of text ≈ 750 tokens.
Average response length per API call. Typical ranges: short answer 200–500 · paragraph 500–1,000 · long-form 1,000–4,000 tokens.
Total API calls per month across all users and jobs. 10 users × 3 sessions/day × 10 calls/session × 30 days = 9,000 requests/month.
Fraction of input tokens served from prompt cache (costs ~10× less). 0 = no caching.
(45,161.29 × $1.25 + 19,354.84 × $10) / 1M × 10,000 req × 12
Enabling Prompt Cache with a 60% hit ratio could save $3,658.06/yr (12% reduction). Fix your system prompt at the start of every request and reuse it across calls.
Switching to GPT-5 Mini cuts cost by 80% — saving $24,000.00/yr. Run an A/B test against your eval set first; most tasks retain 85–95% quality at the budget tier.
Cost breakdown
| Item | Monthly | Yearly |
|---|---|---|
| Input spend (45,161.29 tok/req × 10,000 req × $1.25/MTok) | $564.52 | $6,774.19 |
| Output spend (19,354.84 tok/req × 10,000 req × $10/MTok) | $1,935.48 | $23,225.81 |
| Total monthly spend | $2,500.00 | $30,000.00 |
Comparison
| Option | Monthly | Yearly |
|---|---|---|
| OpenAI GPT-5currentcheapest | $2,500.00 | $30,000.00 |
| Claude Claude Sonnet 4.6 | $4,258.06 | $51,096.78 |
| Gemini Gemini 2.5 Pro | $2,500.00 | $30,000.00 |
Pricing sources
Last verified 2026-06-30 · openai.com/api/pricing openai.com/api/pricing · platform.claude.com/docs/about-claude/pricing platform.claude.com/docs/about-claude/pricing · ai.google.dev/gemini-api/docs/pricing ai.google.dev/gemini-api/docs/pricing
Industry Benchmark
Pricing sources
Last verified 2026-06-30 · openai.com/api/pricing openai.com/api/pricing · platform.claude.com/docs/about-claude/pricing platform.claude.com/docs/about-claude/pricing · ai.google.dev/gemini-api/docs/pricing ai.google.dev/gemini-api/docs/pricing
Trends & comparison
Trend
Comparison (monthly vs. yearly)
OpenAI API pricing in 2026
OpenAI charges per million tokens (MTok) of input and output. GPT-5 is $1.25/MTok input and $10.00/MTok output; GPT-5 Mini is $0.25/$2.00; GPT-5 Nano is $0.05/$0.40. Prompt Cache reduces input costs by 90% for reused prefixes. The Batch API halves all costs for async workloads. This calculator applies all discounts automatically.
GPT-5 vs GPT-5 Mini vs GPT-5 Nano — when to use which
Use GPT-5 for complex reasoning, coding, and multi-step tasks where quality is critical. GPT-5 Mini handles most product-facing conversational tasks at 80% lower cost — ideal for customer-facing chatbots and RAG pipelines. GPT-5 Nano is the cheapest tier for classification, entity extraction, and high-volume simple completions.
Frequently asked questions
How much does the OpenAI API cost per month?▾
Monthly cost = (input tokens ÷ 1,000,000 × input price + output tokens ÷ 1,000,000 × output price) × monthly requests. For GPT-5 at $1.25/MTok input and $10.00/MTok output, 10,000 requests with 1,000 input and 500 output tokens costs roughly $62.50/month. This calculator gives you the exact figure for your workload.
How much does OpenAI Prompt Cache save?▾
Prompt Cache reduces input token costs by up to 90% — from $1.25/MTok to $0.125/MTok for GPT-5. If 60% of your input tokens are cached (fixed system prompt + shared context), a workload costing $1,000/month drops to roughly $460/month — a $6,480/year saving.
Is GPT-5 Mini good enough for production?▾
GPT-5 Mini ($0.25/MTok input, $2.00/MTok output) is 80% cheaper than GPT-5 and handles most product-level tasks at 85–95% quality. The cost difference is significant: the same workload that costs $1,000/month on GPT-5 costs about $200/month on GPT-5 Mini.
When should I use the OpenAI Batch API?▾
The Batch API gives a 50% discount on all models for async workloads where a 24-hour response window is acceptable — report generation, data extraction, document analysis, embeddings. Not suitable for real-time conversations.