Canonical Scenarios

Opinionated, real-world AI workloads designed to help teams reason about cost, risk, and scale before surprises happen.

Customer Support Chatbot (≈50k MAU)

A customer-facing support assistant handling FAQs, account issues, and basic troubleshooting at scale.

Recommended setup

Model: GPT-5 Mini

Strong balance of quality and cost for customer-facing text, predictable latency under sustained load.

Monthly cost (directional)

Expected
$12,000–$15,000 / mo
Spiky / peak usage
$22,000–$26,000 / mo

Typical failure modes

  • Context waste grows silently as conversations extend.
  • Retries amplify cost during incidents.
  • Output length creep increases spend over time.

When this breaks

Becomes inefficient when conversations exceed ~8–10 turns or require large knowledge injections.

How experienced teams mitigate this

  • Cap conversation length and escalate to humans early
  • Trim system and instruction prompts aggressively
  • Treat retries as a cost multiplier

Internal Copilot (≈500 employees)

An internal AI assistant used for document lookup, writing, and reasoning over company knowledge.

Recommended setup

Model: GPT-5

Higher-quality reasoning and writing justify the cost for knowledge work.

Monthly cost (directional)

Expected
$9,000–$12,000 / mo
Spiky / peak usage
$16,000–$20,000 / mo

Typical failure modes

  • Context windows fill quickly with large documents.
  • Usage spikes during launches and incidents.
  • Latency expectations rise as reliance grows.

When this breaks

Struggles when large documents are injected wholesale or sub-second latency becomes mandatory.

How experienced teams mitigate this

  • Chunk and rank documents instead of injecting full files
  • Route simple queries to cheaper models
  • Set explicit expectations around response time

AI Search with RAG (High Context)

Semantic search over large document collections with heavy context injection per query.

Recommended setup

Model: GPT-5

Robust reasoning across long context windows, where retrieval quality matters more than raw throughput.

Monthly cost (directional)

Expected
$18,000–$22,000 / mo
Spiky / peak usage
$30,000–$38,000 / mo

Typical failure modes

  • Context dominates total cost, not output tokens.
  • Low utilization wastes retrieved chunks silently.
  • Spikes during reindexing or discovery events.

When this breaks

Breaks when context exceeds ~15k tokens per query or when retrieval relevance is poorly ranked.

How experienced teams mitigate this

  • Aggressively cap retrieved chunks
  • Rank and filter context before injection
  • Cache common queries and embeddings