Canonical Scenarios
Opinionated, real-world AI workloads designed to help teams reason about cost, risk, and scale before surprises happen.
Customer Support Chatbot (≈50k MAU)
A customer-facing support assistant handling FAQs, account issues, and basic troubleshooting at scale.
Recommended setup
Model: GPT-5 Mini
Strong balance of quality and cost for customer-facing text, predictable latency under sustained load.
Monthly cost (directional)
Expected
$12,000–$15,000 / mo
Spiky / peak usage
$22,000–$26,000 / mo
Typical failure modes
- Context waste grows silently as conversations extend.
- Retries amplify cost during incidents.
- Output length creep increases spend over time.
When this breaks
Becomes inefficient when conversations exceed ~8–10 turns or require large knowledge injections.
How experienced teams mitigate this
- Cap conversation length and escalate to humans early
- Trim system and instruction prompts aggressively
- Treat retries as a cost multiplier
Internal Copilot (≈500 employees)
An internal AI assistant used for document lookup, writing, and reasoning over company knowledge.
Recommended setup
Model: GPT-5
Higher-quality reasoning and writing justify the cost for knowledge work.
Monthly cost (directional)
Expected
$9,000–$12,000 / mo
Spiky / peak usage
$16,000–$20,000 / mo
Typical failure modes
- Context windows fill quickly with large documents.
- Usage spikes during launches and incidents.
- Latency expectations rise as reliance grows.
When this breaks
Struggles when large documents are injected wholesale or sub-second latency becomes mandatory.
How experienced teams mitigate this
- Chunk and rank documents instead of injecting full files
- Route simple queries to cheaper models
- Set explicit expectations around response time
AI Search with RAG (High Context)
Semantic search over large document collections with heavy context injection per query.
Recommended setup
Model: GPT-5
Robust reasoning across long context windows, where retrieval quality matters more than raw throughput.
Monthly cost (directional)
Expected
$18,000–$22,000 / mo
Spiky / peak usage
$30,000–$38,000 / mo
Typical failure modes
- Context dominates total cost, not output tokens.
- Low utilization wastes retrieved chunks silently.
- Spikes during reindexing or discovery events.
When this breaks
Breaks when context exceeds ~15k tokens per query or when retrieval relevance is poorly ranked.
How experienced teams mitigate this
- Aggressively cap retrieved chunks
- Rank and filter context before injection
- Cache common queries and embeddings