Comparison: Provenance approaches

Context

The provenance module gives you two distinct ways to attribute an answer back to its source documents:

Both paths share a common rendering layer (format_citations_markdown and format_claim_citations_markdown) that produces ready-to-display markdown.

Feature comparison

Pipeline-level (CitationRecord) Claim-level (ClaimCitation)
Granularity Per retrieved document Per response span → document block
Primary data source RetrievalHit objects from your retriever Anthropic Citations API response
Entry point build_citation_record() Consume ClaimCitation objects directly
What it tracks Query, retriever name, timestamp, score, optional excerpt Response span (start, end), document index, cited text, block index
Rendering function format_citations_markdown(record, base_url) format_claim_citations_markdown(text, citations, base_url)
Output style Markdown section listing all cited sources Response text with inline footnote-style references
Requires Anthropic Citations API No Yes
Captures retrieval score Yes (CitedSource.score) No
Captures retrieval timestamp Yes (CitationRecord.retrieved_at) No

Tradeoffs

Pipeline-level provenance is the right default for most RAG applications. build_citation_record() converts your existing RetrievalHit objects into a structured CitationRecord with minimal wiring. You get a complete audit trail — what was queried, which retriever ran, when retrieval happened, and how each source scored — all rendered to markdown via format_citations_markdown(). The weakness is granularity: citations point to whole documents (or short excerpts up to excerpt_chars characters), not to the exact sentence that supported a claim.

Claim-level provenance is more precise but narrower in scope. ClaimCitation pinpoints which character span of the model's response corresponds to which block in which document, making it suitable for applications where readers need to verify individual assertions. The tradeoff is a hard dependency on the Anthropic Citations API — you cannot produce ClaimCitation objects from arbitrary retrievers — and you lose retrieval metadata like scores and timestamps.

Use X when...

Use CitationRecord / pipeline-level provenance when:

Use ClaimCitation / claim-level provenance when:

When in doubt, start with build_citation_record() and format_citations_markdown(). They cover the common case and require no external API dependency.

Source files

Tags: provenance, citations, traceability