QueryExpander vs. raw query retrieval
Context
QueryExpander uses Claude Haiku to rewrite a single query into 3–5 alternative phrasings before retrieval. It targets the core recall problem in documentation search: a user's wording rarely matches the exact terms in the indexed documents. By exposing synonyms, feature names, tool categories, and developer jargon, the expander increases the chance that at least one phrasing overlaps with the target content.
Expansion is opt-in and fail-safe. If the Claude API call fails, retrieval continues with the original query unchanged.
Feature comparison
| Capability | Raw query retrieval | QueryExpander |
|---|---|---|
| Query variants sent to retrieval | 1 | 3–5 |
| Handles vocabulary mismatch | No | Yes — generates synonyms and developer jargon |
| LLM dependency | None | Claude Haiku (claude-haiku-4-5-20251001) |
| Latency added | None | One Haiku API call per query |
| Response caching | N/A | Yes (cache=True by default) |
| Async support | Depends on retriever | Yes — expand_async() |
| API failure behavior | N/A | Falls back to original query automatically |
| Output format | — | JSON array of strings |
Tradeoffs
Where QueryExpander wins
- Recall on low-overlap queries. When a user asks "how do I hook into the build pipeline" and your docs say "CI integration", raw keyword retrieval misses entirely. The expander surfaces phrasings like "CI integration", "build hook", and "pipeline plugin" in a single call.
- No schema changes required. Expansion happens before retrieval; your index and retrieval layer stay unchanged.
- Cached by default. Repeated identical queries hit the cache rather than the API, so the latency cost is paid only once per unique query.
Where raw retrieval wins
- Latency-sensitive paths. Every non-cached query requires a round trip to the Claude API. If your retrieval SLA is tight and your query vocabulary is consistent with your document vocabulary, the overhead is not justified.
- Deterministic output. LLM-generated phrasings vary. If you need reproducible retrieval results (for testing or auditing), raw retrieval is more predictable.
- Offline or air-gapped environments.
QueryExpanderrequires an active Anthropic API connection. Raw retrieval has no external dependency.
Use QueryExpander when…
- Your users phrase queries differently from how your documentation is written — common in developer tools where users say "token limit" but docs say "context window".
- You are seeing low retrieval recall on integration, workflow, or conceptual queries.
- You can tolerate one additional API call per unique query (or you expect cache hits to absorb most of the cost).
- You want async retrieval pipelines —
expand_async()makes this straightforward.
Stick with raw query retrieval when…
- Your query and document vocabularies are already well-aligned and recall is acceptable.
- You are running in an environment without Anthropic API access.
- You need fully deterministic retrieval results.
- You are writing a quick exploratory script — wiring up
QueryExpanderadds an API key dependency that is not worth it for a one-off.
Source files
src/attune_rag/expander.py
Tags: expander, hybrid-retrieval, recall, claude, haiku