Troubleshoot provenance
Before you start
The provenance module (src/attune_rag/provenance.py) records which corpus entries grounded each answer and formats that record for display. The core data flow is:
build_citation_record()convertsRetrievalHitobjects into aCitationRecord(withCitedSourceentries).format_citations_markdown()renders aCitationRecordas a markdown section.format_claim_citations_markdown()annotates response text with footnote-style citations from the Anthropic Citations API (ClaimCitationobjects).
Keep this flow in mind as you work through the steps below.
Symptom table
| If you observe | Check |
|---|---|
format_citations_markdown() returns empty or malformed markdown |
Confirm the CitationRecord passed in has a non-empty hits tuple and that each CitedSource.score is a valid float |
format_claim_citations_markdown() produces no footnotes |
Verify the citations iterable is not empty and that each ClaimCitation.response_span falls within the bounds of text |
build_citation_record() raises an exception |
Check that every object in hits exposes the attributes build_citation_record() expects from a RetrievalHit; a duck-typing mismatch is the most common cause |
CitedSource.excerpt is None when you expect text |
build_citation_record() truncates excerpts to excerpt_chars (default 200); pass a larger value if the source text is being cut to zero |
| Citations reference the wrong document | Inspect ClaimCitation.document_index against the ordered list of documents you passed to the Citations API — the index is zero-based |
| Markdown links are missing or broken | Confirm you are passing a non-None base_url to format_citations_markdown() or format_claim_citations_markdown(); without it, links are not rendered |
Step-by-step diagnosis
Work through these steps in order — each one is cheaper than the next.
-
Reproduce the failure in isolation. Strip the call down to its required arguments and confirm the failure still occurs outside the surrounding application context. For example:
from datetime import datetime, timezone from attune_rag.provenance import build_citation_record, format_citations_markdown record = build_citation_record( query="test query", hits=[], # replace with your actual hits retriever_name="my-retriever", retrieved_at=datetime.now(timezone.utc), ) print(format_citations_markdown(record)) -
Inspect the data at each stage. Print (or assert on) the intermediate values before they reach the formatting functions:
print(record.hits) # Are CitedSource entries present? print(record.retrieved_at) # Is the datetime timezone-aware? for src in record.hits: print(src.template_path, src.score, src.excerpt)For claim citations, check the span and index values directly:
for c in citations: print(c.response_span, c.document_index, c.cited_text) -
Run the provenance tests.
pytest -k "provenance" -vIf a test already exercises your failing path, its fixtures give you a known-good input to compare against.
-
Enable DEBUG logging. If your application configures logging for the
attune_ragnamespace, set it toDEBUGand re-run:import logging logging.getLogger("attune_rag").setLevel(logging.DEBUG)Look for unexpected
Nonevalues or short-circuit returns in the output.
Common fixes
-
Empty
hitstuple passed tobuild_citation_record(). If retrieval returned no results,CitationRecord.hitswill be an empty tuple and the formatted output will be empty. Verify your retriever is returning results before callingbuild_citation_record(). -
RetrievalHitattributes missing.build_citation_record()reads attributes from each object inhits. If you pass a custom object that is missing an expected attribute, you'll get anAttributeError. Confirm your hit objects match the expected interface. -
Truncated excerpts. The default
excerpt_chars=200inbuild_citation_record()may cut content too short. Increase it at the call site:record = build_citation_record(..., excerpt_chars=500) -
base_urlomitted from formatting calls. Bothformat_citations_markdown()andformat_claim_citations_markdown()accept an optionalbase_url. If you expect hyperlinked citations in the output, pass the base URL explicitly:md = format_citations_markdown(record, base_url="https://docs.example.com") -
ClaimCitation.response_spanout of range. Ifresponse_spanindices exceed the length oftext, annotation will silently skip or misplace the footnote. Confirm thetextargument you pass toformat_claim_citations_markdown()is the same string the Citations API produced the spans against. -
Dependency version mismatch. A change in the Anthropic SDK can alter the shape of Citations API responses. Run:
pip show anthropicand confirm the installed version matches what your project requires.
Source files
src/attune_rag/provenance.py
Tags: provenance, citations, traceability