LLM Token Pricing Explained: How to Calculate and Cut Your API Costs
If you're building on top of LLM APIs, token pricing is the line item that quietly eats your budget. Understanding how it actually works — not just the headline rates, but the mechanics of input vs. output billing, caching, and batch discounts — gives you real leverage over your monthly spend. Here's the full picture.
What Exactly Is a Token?
A token is the smallest unit of text a model processes. Think of it as a word-fragment: not quite a character, not quite a word, but a chunk determined by the model's tokenizer.
For English, 1 token ≈ 4 characters or about 0.75 words. "ChatGPT is great" clocks in at roughly 4 tokens.
For CJK languages (Chinese, Japanese, Korean), each character typically maps to 1–2 tokens due to Unicode encoding complexity. The same meaning expressed in Chinese can consume 30%–80% more tokens than its English equivalent:
- "Hello, how are you?" ≈ 6 tokens
- "你好,你最近怎么样?" ≈ 9–11 tokens
This gap matters. If your user base is primarily non-English, your cost projections need to account for it from day one.
Input Tokens vs. Output Tokens
API billing splits into two buckets with different price tags:
Input tokens cover everything you send to the model — system prompts, conversation history, user messages, injected context.
Output tokens cover what the model generates. Because text generation is computationally heavier, output tokens typically cost 3–5× more than input tokens.
Here's what the pricing landscape looks like per million tokens:
| Model Tier | Input Price | Output Price | |---|---|---| | Lightweight (e.g., Claude Haiku) | $0.25 – $0.80 | $1.00 – $4.00 | | Mid-range (e.g., Claude Sonnet) | $3.00 | $15.00 | | Flagship (e.g., Claude Opus) | $15.00 | $75.00 |
The spread between tiers is enormous — up to 60×. Picking the right tier for each task is the single biggest cost lever you have.
Estimating Token Usage
Before you get surprised by an invoice, build a rough model of your per-request consumption:
Total tokens = system prompt + conversation history + user input + model output
Take a customer support bot as an example:
- System prompt (role definition, response rules): ~500 tokens
- Recent conversation history (5 turns): ~1,000 tokens
- Current user message: ~100 tokens
- Model response: ~300 tokens
That's ~1,900 tokens per request — 1,600 input, 300 output.
On Claude Sonnet pricing:
Input: 1,600 / 1,000,000 × $3.00 = $0.0048
Output: 300 / 1,000,000 × $15.00 = $0.0045
Total per request ≈ $0.0093
Looks cheap. But at 10,000 conversations per day, you're at ~$2,790/month. The numbers compound fast.
Prompt Caching: The Discount Most Teams Overlook
Prompt caching is one of the most impactful cost features available today. The idea is straightforward: when consecutive requests share the same prefix (like a system prompt), the provider reuses prior computation instead of reprocessing it. Cached input tokens are billed at roughly 10% of the standard rate.
Back to the support bot example:
- 500 cached tokens (system prompt) at $0.30/M: $0.00015
- 1,100 non-cached input tokens at $3.00/M: $0.0033
- 300 output tokens at $15.00/M: $0.0045
Per-request cost drops to ~$0.0080 — a 14% reduction. That's with a short system prompt. If your prefix includes a knowledge base or lengthy instructions (2,000–5,000 tokens), caching can cut input costs by 30%–50%.
The key to high cache hit rates: put all static content at the front of your message array, and all variable content (user input, latest context) at the end. Order matters.
Six Tactics to Lower Your API Bill
1. Trim Your Prompts
Bloated system prompts are the most common source of wasted tokens. Audit yours. Remove redundant instructions, excessive examples, and over-specified formatting rules. A well-crafted 200-token prompt often performs as well as an 800-token one.
2. Route by Task Complexity
Not every request needs your most powerful model. Set up a routing layer:
- Classification, extraction, formatting → lightweight tier
- Summarization, Q&A → mid-range tier
- Complex reasoning, creative generation → flagship tier
This alone can reduce costs 10–60× on eligible traffic.
3. Maximize Cache Hits
Structure every request so the static prefix is identical across calls. For multi-turn conversations, keep the message array prefix stable.
4. Use Batch APIs for Async Work
If you're running bulk translation, summarization, or analysis, batch endpoints typically offer ~50% off. Turnaround extends to 24 hours, but for offline pipelines, that's a non-issue.
5. Cap Output Length
Set max_tokens to prevent the model from generating unnecessarily long responses. Pair this with a prompt instruction like "respond concisely" for a double effect on output token spend.
6. Consider Prepaid Credits
Most providers offer prepaid tiers with 5%–20% discounts. The better platforms don't expire your balance or reset it monthly — you draw it down at your own pace. For teams spending $500+/month consistently, prepaid almost always wins over pay-as-you-go.
Real-World Monthly Cost Estimates
All figures below assume Claude Sonnet-tier pricing with caching optimizations applied.
Customer support bot
- 5,000 conversations/day
- ~1,500 input tokens, ~300 output tokens per conversation
- 60% cache hit rate
- Monthly estimate: $950 – $1,200
Content generation platform
- 1,000 generation tasks/day
- ~800 input tokens, ~1,500 output tokens per task
- 30% cache hit rate
- Monthly estimate: $780 – $950
Document analysis and summarization
- 200 documents/day
- ~3,000 input tokens, ~500 output tokens per document
- Batch API (50% discount)
- Monthly estimate: $180 – $250
With the right combination of model routing, caching, and batch processing, actual costs typically land 30%–60% below naive estimates.
The Bottom Line
Token pricing rewards teams that pay attention to the details. Four things matter most:
- Know your token breakdown — understand where spend concentrates.
- Match model tier to task — don't pay flagship rates for simple jobs.
- Exploit caching — make repeated prefixes nearly free.
- Use volume to your advantage — batch discounts and prepaid credits compound over time.
When evaluating API providers, look past the headline per-token rate. The maturity of their caching implementation, the depth of batch discounts, and whether prepaid credits expire — these details determine your real long-term cost.