AI Search (RAG)
Canonical scenario page for retrieval-augmented generation workloads dominated by context injection.
AI Search with RAG (High Context)
Variable usage patternSemantic search over large document collections with heavy context injection per query.
Recommended setup
Model: GPT-5
Robust reasoning across long context windows, where retrieval quality matters more than raw throughput.
Monthly cost (directional)
Expected
$18,000–$22,000 / mo
Spiky / peak usage
$30,000–$38,000 / mo
Typical failure modes
- Context dominates total cost, not output tokens.
- Low utilization wastes retrieved chunks silently.
- Spikes during reindexing or discovery events.
When this breaks
Breaks when context exceeds ~15k tokens per query or when retrieval relevance is poorly ranked.
How experienced teams mitigate this
- Aggressively cap retrieved chunks
- Rank and filter context before injection
- Cache common queries and embeddings