LLM Economics

Token Budget Calculator

Work backwards from your monthly AI budget to find the maximum tokens per request your product can afford. Compare models and scenarios to get the most from every dollar.

Inputs

Your total monthly budget for API calls.

The model you plan to use.

Selects a typical input:output token ratio for your scenario.

How many API calls your application makes per month.

Fraction of input tokens served from Prompt Cache. Reduces effective input cost.

Tokens per request (RAG Chat)
64,516.13tokens
Use 45,161.29 input + 19,354.84 output tokens
GPT-5 Mini allows 64,516.13 tokens per request at $500.00/mo for 10,000 requests. System prompt + context chunks + user message.
Cache would unlock more tokens within budgetEnable cache

Enabling 60% Prompt Cache on this RAG Chat workload would allow ~109,851.79 additional tokens per request within the same budget.

Input tokens / request
45,161.29
Output tokens / request
19,354.84
Monthly token budget
645,161K
Effective monthly spend
$500.00
Cost per request
$0.050000
Scenario
RAG Chat

Cost breakdown

ItemMonthlyYearly
Input tokens / month$451,612.90$5,419,354.80
Output tokens / month$193,548.40$2,322,580.80
Total tokens / month$645,161.30$7,741,935.60
Monthly spend (USD)$500.00$6,000.00

Comparison

OptionMonthlyYearly
GPT-5 Mini45,161.29 input + 19,354.84 output tokens/reqcurrent$45,161.29$64,516.13
GPT-5 Nano225,806 input + 96,774 output tokens/req$225,806.46$322,580.65
Gemini Flash-Lite184,211 input + 78,947 output tokens/req$184,210.52$263,157.89
Gemini Flash36,458 input + 15,625 output tokens/req$36,458.33$52,083.33
Claude Haiku15,909 input + 6,818 output tokens/req$15,909.09$22,727.27

Pricing sources

Last verified 2026-06-30 · openai.com/api/pricing openai.com/api/pricing · platform.claude.com/docs/about-claude/pricing platform.claude.com/docs/about-claude/pricing · ai.google.dev/gemini-api/docs/pricing ai.google.dev/gemini-api/docs/pricing

Trends & comparison

Trend

Comparison (monthly vs. yearly)

Reverse-budgeting: the right way to plan LLM product costs

Most developers start by picking a model and calculating what it costs. But product teams need the inverse: given a monthly infrastructure budget and expected request volume, how much context can each request afford? This matters for RAG chunk counts, conversation history length, system prompt complexity, and tool-call result sizes.

Token ratios by scenario — what to expect

RAG Chat typically has a 70:30 input:output ratio — long context (system prompt + retrieved chunks) with a short answer. Code Assistant runs 60:40 — files and diffs in, a meaningful code response out. Document Analysis is 80:20 — most tokens are the document itself, the extraction is short. Simple Q&A is 55:45 — relatively balanced. AI Agent is 65:35 — multi-turn context with tool outputs accumulates in the input.

Frequently asked questions

How many tokens per dollar does GPT-5 give you?

GPT-5 charges $1.25/MTok for input and $10.00/MTok for output. For a RAG chat scenario (70% input, 30% output), $1 buys roughly 588,000 total tokens — about 411,000 input and 176,000 output tokens. That is about 58 requests with 10,000 tokens each. GPT-5 Mini at $0.25/$2.00 gives approximately 5× more tokens for the same dollar.

How do I calculate maximum tokens per request from a monthly budget?

Step 1: Divide your monthly budget by monthly request count to get budget per request. Step 2: Solve for total tokens: tokens = (budget per request × 1,000,000) ÷ (input_share × input_price + output_share × output_price). Step 3: Split by your input:output ratio for your scenario. This calculator does it all automatically.

What is a realistic token budget for a $500/month AI product?

At $500/month with 10,000 monthly requests on GPT-5 Mini ($0.25/$2.00, RAG Chat scenario): budget per request = $0.05, which allows roughly 52,000 tokens per request — about 36,400 input tokens and 15,600 output tokens. That is enough for a substantial context window with several retrieved chunks.

Which AI model gives the most tokens per dollar?

In 2026, Gemini 2.5 Flash-Lite ($0.10/MTok input, $0.40/MTok output) and GPT-5 Nano ($0.05/$0.40) are the most token-efficient models. For a 70:30 input:output scenario, Flash-Lite gives about 5.9M tokens per dollar — roughly 118× more than GPT-5. The tradeoff is quality: Flash-Lite is best for high-volume simple tasks.

Related calculators

Token Budget Calculator — How Many LLM Tokens Per Dollar? | LLM Economics