Русский Tiếng Việt 한국어 日本語 Deutsch English Latina Español Français العربية 中文

LLM Token Pricing Explained: How to Calculate and Cut Your API Costs

April 1, 2026

token pricingAPI costsLLMcost optimizationprompt caching

If you're building on top of LLM APIs, token pricing is the line item that quietly eats your budget. Understanding how it actually works — not just the headline rates, but the mechanics of input vs. output billing, caching, and batch discounts — gives you real leverage over your monthly spend. Here's the full picture.

What Exactly Is a Token?

A token is the smallest unit of text a model processes. Think of it as a word-fragment: not quite a character, not quite a word, but a chunk determined by the model's tokenizer.

For English, 1 token ≈ 4 characters or about 0.75 words. "ChatGPT is great" clocks in at roughly 4 tokens.

For CJK languages (Chinese, Japanese, Korean), each character typically maps to 1–2 tokens due to Unicode encoding complexity. The same meaning expressed in Chinese can consume 30%–80% more tokens than its English equivalent:

"Hello, how are you?" ≈ 6 tokens
"你好，你最近怎么样？" ≈ 9–11 tokens

This gap matters. If your user base is primarily non-English, your cost projections need to account for it from day one.

Input Tokens vs. Output Tokens

API billing splits into two buckets with different price tags:

Input tokens cover everything you send to the model — system prompts, conversation history, user messages, injected context.

Output tokens cover what the model generates. Because text generation is computationally heavier, output tokens typically cost 3–5× more than input tokens.

Here's what the pricing landscape looks like per million tokens:

| Model Tier | Input Price | Output Price | |---|---|---| | Lightweight (e.g., Claude Haiku) | $0.25 – $0.80 | $1.00 – $4.00 | | Mid-range (e.g., Claude Sonnet) | $3.00 | $15.00 | | Flagship (e.g., Claude Opus) | $15.00 | $75.00 |

The spread between tiers is enormous — up to 60×. Picking the right tier for each task is the single biggest cost lever you have.

Estimating Token Usage

Before you get surprised by an invoice, build a rough model of your per-request consumption:

Total tokens = system prompt + conversation history + user input + model output

Take a customer support bot as an example:

System prompt (role definition, response rules): ~500 tokens
Recent conversation history (5 turns): ~1,000 tokens
Current user message: ~100 tokens
Model response: ~300 tokens

That's ~1,900 tokens per request — 1,600 input, 300 output.

On Claude Sonnet pricing:

Input:  1,600 / 1,000,000 × $3.00  = $0.0048
Output:   300 / 1,000,000 × $15.00 = $0.0045
Total per request ≈ $0.0093

Looks cheap. But at 10,000 conversations per day, you're at ~$2,790/month. The numbers compound fast.

Prompt Caching: The Discount Most Teams Overlook

Prompt caching is one of the most impactful cost features available today. The idea is straightforward: when consecutive requests share the same prefix (like a system prompt), the provider reuses prior computation instead of reprocessing it. Cached input tokens are billed at roughly 10% of the standard rate.

Back to the support bot example:

500 cached tokens (system prompt) at $0.30/M: $0.00015
1,100 non-cached input tokens at $3.00/M: $0.0033
300 output tokens at $15.00/M: $0.0045

Per-request cost drops to ~$0.0080 — a 14% reduction. That's with a short system prompt. If your prefix includes a knowledge base or lengthy instructions (2,000–5,000 tokens), caching can cut input costs by 30%–50%.

The key to high cache hit rates: put all static content at the front of your message array, and all variable content (user input, latest context) at the end. Order matters.

Six Tactics to Lower Your API Bill

1. Trim Your Prompts

Bloated system prompts are the most common source of wasted tokens. Audit yours. Remove redundant instructions, excessive examples, and over-specified formatting rules. A well-crafted 200-token prompt often performs as well as an 800-token one.

2. Route by Task Complexity

Not every request needs your most powerful model. Set up a routing layer:

Classification, extraction, formatting → lightweight tier
Summarization, Q&A → mid-range tier
Complex reasoning, creative generation → flagship tier

This alone can reduce costs 10–60× on eligible traffic.

3. Maximize Cache Hits

Structure every request so the static prefix is identical across calls. For multi-turn conversations, keep the message array prefix stable.

4. Use Batch APIs for Async Work

If you're running bulk translation, summarization, or analysis, batch endpoints typically offer ~50% off. Turnaround extends to 24 hours, but for offline pipelines, that's a non-issue.

5. Cap Output Length

Set max_tokens to prevent the model from generating unnecessarily long responses. Pair this with a prompt instruction like "respond concisely" for a double effect on output token spend.

6. Consider Prepaid Credits

Most providers offer prepaid tiers with 5%–20% discounts. The better platforms don't expire your balance or reset it monthly — you draw it down at your own pace. For teams spending $500+/month consistently, prepaid almost always wins over pay-as-you-go.

Real-World Monthly Cost Estimates

All figures below assume Claude Sonnet-tier pricing with caching optimizations applied.

Customer support bot

5,000 conversations/day
~1,500 input tokens, ~300 output tokens per conversation
60% cache hit rate
Monthly estimate: $950 – $1,200

Content generation platform

1,000 generation tasks/day
~800 input tokens, ~1,500 output tokens per task
30% cache hit rate
Monthly estimate: $780 – $950

Document analysis and summarization

200 documents/day
~3,000 input tokens, ~500 output tokens per document
Batch API (50% discount)
Monthly estimate: $180 – $250

With the right combination of model routing, caching, and batch processing, actual costs typically land 30%–60% below naive estimates.

The Bottom Line

Token pricing rewards teams that pay attention to the details. Four things matter most:

Know your token breakdown — understand where spend concentrates.
Match model tier to task — don't pay flagship rates for simple jobs.
Exploit caching — make repeated prefixes nearly free.
Use volume to your advantage — batch discounts and prepaid credits compound over time.

When evaluating API providers, look past the headline per-token rate. The maturity of their caching implementation, the depth of batch discounts, and whether prepaid credits expire — these details determine your real long-term cost.

Back to Blog