Comparison: Pipeline orchestration approaches

Context

RagPipeline wires together four concerns — corpus retrieval, optional query expansion and reranking, prompt assembly, and LLM generation — into two callable entry points:

The sections below compare these two approaches and describe the narrower alternatives they replace.


Feature comparison

Capability run() run_and_generate()
Retrieves top-k documents from corpus
Expands query via QueryExpander ✓ (if configured) ✓ (if configured)
Reranks hits via LLMReranker ✓ (if configured) ✓ (if configured)
Assembles augmented prompt
Returns CitationRecord provenance
Returns per-claim citations (claim_citations)
Records elapsed time (elapsed_ms)
Signals fallback when no grounding found (fallback_used)
Calls an LLM and returns generated text
Supports native citation mode (use_native_citations)
Lets you choose your own LLM call site
Requires an LLMProvider

Fallback behaviour

When no grounding context is found in the corpus, both entry points substitute a fallback prompt that instructs the model to answer honestly and avoid inventing APIs, workflow names, or CLI commands. RagResult.fallback_used is set to True so you can detect this condition downstream.


When NOT to use RagPipeline directly


Use X when…

Use run() when you want full RAG orchestration — retrieval, reranking, prompt assembly, and provenance — but need to control the LLM call yourself (to apply streaming, token budgets, retry logic, or a provider not yet supported by LLMProvider).

Use run_and_generate() when you want a single call that handles everything end-to-end and you are happy to delegate the LLM invocation to the pipeline. This is the right default for most production use cases where the built-in LLMProvider covers your target model.

Use a retriever directly (e.g. KeywordRetriever) when you don't need prompt assembly or citation tracking — for example, a search-results UI that displays raw document excerpts.


Source files

Tags: pipeline, orchestration, rag, result