Comparison: Pipeline orchestration approaches
Context
RagPipeline wires together four concerns — corpus retrieval, optional query expansion and reranking, prompt assembly, and LLM generation — into two callable entry points:
run()— returns aRagResultwith a fully assembled prompt andCitationRecordprovenance, but does not call an LLM. You supply the LLM call yourself.run_and_generate()— does everythingrun()does, then calls anLLMProviderand returns the generated text alongside theRagResult.
The sections below compare these two approaches and describe the narrower alternatives they replace.
Feature comparison
| Capability | run() |
run_and_generate() |
|---|---|---|
| Retrieves top-k documents from corpus | ✓ | ✓ |
Expands query via QueryExpander |
✓ (if configured) | ✓ (if configured) |
Reranks hits via LLMReranker |
✓ (if configured) | ✓ (if configured) |
| Assembles augmented prompt | ✓ | ✓ |
Returns CitationRecord provenance |
✓ | ✓ |
Returns per-claim citations (claim_citations) |
✓ | ✓ |
Records elapsed time (elapsed_ms) |
✓ | ✓ |
Signals fallback when no grounding found (fallback_used) |
✓ | ✓ |
| Calls an LLM and returns generated text | ✗ | ✓ |
Supports native citation mode (use_native_citations) |
✗ | ✓ |
| Lets you choose your own LLM call site | ✓ | ✗ |
Requires an LLMProvider |
✗ | ✓ |
Fallback behaviour
When no grounding context is found in the corpus, both entry points substitute a fallback prompt that instructs the model to answer honestly and avoid inventing APIs, workflow names, or CLI commands. RagResult.fallback_used is set to True so you can detect this condition downstream.
When NOT to use RagPipeline directly
- Single-step retrieval without prompt assembly. If you only need ranked document hits, use
KeywordRetriever(or a customRetrieverProtocol) directly.RagPipelineadds prompt-building overhead you won't use. - Exploratory or one-off queries. A short script that calls a retriever and formats its own prompt is simpler to reason about than instantiating a full pipeline for a single run.
- Custom prompt logic that diverges from
PROMPT_VARIANTS. If the built-in prompt variants don't fit your use case, building the augmented prompt yourself withbuild_augmented_promptgives you more control without patching pipeline internals. - Multi-pipeline orchestration. If your application chains multiple retrieval passes or fans out across corpora, coordinate that logic in a layer above
RagPipelinerather than nesting pipeline calls.
Use X when…
Use run() when you want full RAG orchestration — retrieval, reranking, prompt assembly, and provenance — but need to control the LLM call yourself (to apply streaming, token budgets, retry logic, or a provider not yet supported by LLMProvider).
Use run_and_generate() when you want a single call that handles everything end-to-end and you are happy to delegate the LLM invocation to the pipeline. This is the right default for most production use cases where the built-in LLMProvider covers your target model.
Use a retriever directly (e.g. KeywordRetriever) when you don't need prompt assembly or citation tracking — for example, a search-results UI that displays raw document excerpts.
Source files
src/attune_rag/pipeline.pysrc/attune_rag/__init__.py
Tags: pipeline, orchestration, rag, result