Note: provenance
Context
The provenance module (src/attune_rag/provenance.py) records which corpus entries grounded each RAG pipeline answer and renders that attribution as formatted markdown. It covers two levels of granularity: whole-response citations and claim-level citations tied to specific spans in the response text.
Content
A single RAG pipeline run produces one CitationRecord, which holds the original query, the retriever name, the retrieval timestamp, and a tuple of CitedSource entries. Each CitedSource identifies a retrieved document by its template path and category, carries a relevance score, and optionally includes a short excerpt (truncated to excerpt_chars, defaulting to 200 characters) via build_citation_record().
Claim-level attribution uses ClaimCitation, which is populated by the Anthropic Citations API. Each ClaimCitation maps a character span in the response (response_span) back to a specific document and block (document_index, cited_block_index) and records the exact text that was cited.
The two rendering functions operate at matching granularities:
format_citations_markdown(record, base_url)renders a fullCitationRecordas a markdown section, optionally linking sources to abase_url.format_claim_citations_markdown(text, citations, base_url)renders response text with footnote-style markers for eachClaimCitation.
build_citation_record() is the standard way to construct a CitationRecord from raw RetrievalHit objects returned by a retriever.
Note:
ClaimCitationis produced by the Anthropic Citations API and is only populated when that API is in use.CitationRecordandCitedSourceare populated for all retriever backends.
Source files
src/attune_rag/provenance.py
Tags: provenance, citations, traceability