AI Token Counter + Prompt Cost Calculator
Estimate tokens and calculate costs for GPT-5, Claude 4.6, and Gemini 2.5. All counting happens in your browser — no prompts uploaded.
Advanced options
Cost Breakdown
| Model | Input tokens | Output tokens | Cost/call | Monthly cost |
|---|
How to Use
- Paste your prompt into the text area (or drop a
.txt/.mdfile) - Read the estimates — word count, character count, estimated tokens (±10%)
- Check the cost table — per-call and monthly costs across 8 models
- Adjust output slider — set expected output length as a % of input
- Open Advanced for cache-hit %, agent/RAG presets, and batch mode
- Copy or share — markdown table or permalink URL
Why We Built This
Token counting tools exist, but none combine multi-model cost comparison, cache-hit modeling, and agent workload presets in a single free tool. We built this because developers planning AI budgets need to compare GPT-5 vs. Claude 4.6 vs. Gemini 2.5 costs quickly — without signing up, without uploading prompts, and without manually checking three different pricing pages.
The default estimate uses empirical character-to-token ratios (chars/3.6 for GPT, chars/3.5 for Claude, chars/3.8 for Gemini) validated against official tokenizers. It is within ±10% for English text on 90%+ of test samples. For exact counts, use the vendor's tokenizer API directly — but for cost planning and model comparison, ±10% is more than sufficient.
Frequently Asked Questions
How accurate is the default token estimate?
Within ±10% for English text across all major models. The estimate uses empirical character-to-token ratios validated against official tokenizers. For exact counts, use the official vendor tokenizer APIs.
Which models are supported?
GPT-5 family (GPT-5, GPT-5 Mini, GPT-5 Nano), Claude 4.6 family (Opus, Sonnet, Haiku), and Gemini 2.5 family (Pro, Flash). Pricing is updated monthly — see the date in the footer.
Why don't you use the exact tokenizer by default?
Because the OpenAI tokenizer alone is 600KB. We keep the page under 50KB on first load so it works instantly on mobile. The ±10% estimate is accurate enough for cost planning.
Are my prompts sent anywhere?
No. All counting and cost math happens in your browser. No prompts are uploaded. Verify in DevTools Network tab.
How do you handle prompt caching costs?
Set the cache-hit slider in the advanced section. The calculator applies the cached-input rate (typically ~10% of standard input) to that portion of your tokens.
What's the "agent workload" preset?
It models multi-turn agent scenarios — multiple tool calls, repeated context, retrieval. The "Agent (10 tool calls)" preset adds 500 input + 200 output tokens per tool call to your base prompt.
Can I compare multiple prompts at once?
Yes. Paste prompts separated by --- and toggle "Batch mode" in the advanced section. Costs are summed across all prompts.
How do you decide which model is "cheapest"?
For your specific token count + output length, we compute cost across every model and highlight the lowest. Adjust the daily-calls slider to see how it scales monthly.
Are Gemini token counts as accurate as OpenAI/Claude?
Gemini has no public official tokenizer for browser use, so we use the empirical chars/3.8 ratio. For workloads where ±10% matters, validate against the Gemini API's countTokens endpoint.
How often is pricing updated?
Monthly, manually, with a PR diff in the public repo. The footer shows "Prices as of YYYY-MM-DD." If you spot a stale rate, file an issue on GitHub.
Related Tools
About Token Counting
Large language models process text as tokens — subword units that typically represent 3-4 characters of English text. Every API call is billed by token count: input tokens (your prompt) and output tokens (the model's response). Understanding token costs is essential for budgeting AI applications.
Different model families use different tokenizers. OpenAI's GPT-5 uses the o200k_base encoding (~3.6 chars per token for English). Anthropic's Claude 4.6 uses a similar BPE tokenizer (~3.5 chars per token). Google's Gemini 2.5 averages ~3.8 characters per token. These ratios hold well for English prose but vary for code, non-Latin scripts, and structured data.
Prompt caching is a significant cost optimization. When a large system prompt or context window is reused across calls, vendors charge a reduced rate (typically 10% of the standard input rate) for the cached portion. Our cache-hit slider models this: at 80% cache hit, 80% of your input tokens are billed at the cached rate and 20% at the standard rate.
Agent workloads multiply costs non-linearly. A 10-tool-call agent run doesn't just cost 10x a single call — each tool call adds context (previous results, function schemas) to the growing conversation. Our presets model this accumulation: the "Agent (10 tool calls)" preset adds 500 input + 200 output tokens per tool call, simulating real-world agent runs measured across GPT-5 and Claude 4.6 workflows.