ModelIndex
Blog·thinking behind ModelIndex
Home/Canonical Scenarios/AI Search (RAG)

AI Search (RAG)

Canonical scenario page for retrieval-augmented generation workloads dominated by context injection.

AI Search with RAG (High Context)

Variable usage pattern

Semantic search over large document collections with heavy context injection per query.

Recommended setup

Model: GPT-5

Robust reasoning across long context windows, where retrieval quality matters more than raw throughput.

Monthly cost (directional)

Expected
$18,000–$22,000 / mo
Spiky / peak usage
$30,000–$38,000 / mo

Typical failure modes

  • Context dominates total cost, not output tokens.
  • Low utilization wastes retrieved chunks silently.
  • Spikes during reindexing or discovery events.

When this breaks

Breaks when context exceeds ~15k tokens per query or when retrieval relevance is poorly ranked.

How experienced teams mitigate this

  • Aggressively cap retrieved chunks
  • Rank and filter context before injection
  • Cache common queries and embeddings