Canonical Scenarios

Opinionated, real-world AI workloads designed to help teams reason about cost, risk, and scale before surprises happen.

Customer Support Chatbot (≈50k MAU)

A customer-facing support assistant handling FAQs, account issues, and basic troubleshooting at scale.

Recommended setup

Model: GPT-5 Mini

Strong balance of quality and cost for customer-facing text, predictable latency under sustained load.

Monthly cost (directional)

Expected

$12,000–$15,000 / mo

Spiky / peak usage

$22,000–$26,000 / mo

Typical failure modes

Context waste grows silently as conversations extend.
Retries amplify cost during incidents.
Output length creep increases spend over time.

When this breaks

Becomes inefficient when conversations exceed ~8–10 turns or require large knowledge injections.

How experienced teams mitigate this

Cap conversation length and escalate to humans early
Trim system and instruction prompts aggressively
Treat retries as a cost multiplier

Internal Copilot (≈500 employees)

An internal AI assistant used for document lookup, writing, and reasoning over company knowledge.

Recommended setup

Model: GPT-5

Higher-quality reasoning and writing justify the cost for knowledge work.

Monthly cost (directional)

Expected

$9,000–$12,000 / mo

Spiky / peak usage

$16,000–$20,000 / mo

Typical failure modes

Context windows fill quickly with large documents.
Usage spikes during launches and incidents.
Latency expectations rise as reliance grows.

When this breaks

Struggles when large documents are injected wholesale or sub-second latency becomes mandatory.

How experienced teams mitigate this

Chunk and rank documents instead of injecting full files
Route simple queries to cheaper models
Set explicit expectations around response time

AI Search with RAG (High Context)

Semantic search over large document collections with heavy context injection per query.

Recommended setup

Model: GPT-5

Robust reasoning across long context windows, where retrieval quality matters more than raw throughput.

Monthly cost (directional)

Expected

$18,000–$22,000 / mo

Spiky / peak usage

$30,000–$38,000 / mo

Typical failure modes

Context dominates total cost, not output tokens.
Low utilization wastes retrieved chunks silently.
Spikes during reindexing or discovery events.

When this breaks

Breaks when context exceeds ~15k tokens per query or when retrieval relevance is poorly ranked.

How experienced teams mitigate this

Aggressively cap retrieved chunks
Rank and filter context before injection
Cache common queries and embeddings