Retrieval errors
Common error signatures
Most retrieval failures fall into one of three categories: a corpus object that doesn't satisfy CorpusProtocol, a query that reduces to zero tokens after stopword filtering, or a k value that is incompatible with the scorer. The errors typically surface from KeywordRetriever.retrieve() or from a custom retriever that doesn't fully implement RetrieverProtocol.
Concrete signatures to watch for:
AttributeError: 'XYZ' object has no attribute 'retrieve'— The object passed as a retriever doesn't implementRetrieverProtocol. Any retriever must exposeretrieve(query: str, corpus: CorpusProtocol, k: int = 3) -> Iterable[RetrievalHit].TypeErroronretrieve()call — Argument types don't match the signature. Common causes: passing an integer wherequeryexpects a string, or passing a plain list wherecorpusexpects aCorpusProtocolobject.- Empty result list from
KeywordRetriever.retrieve()— Not an exception, but a silent failure. The query likely consists entirely of stopwords (for example,"how do i","what is the"), leaving no scored tokens to match against the corpus.
Where errors originate
Check the class that matches your symptom before walking the call stack further.
KeywordRetriever.retrieve(query, corpus, k)insrc/attune_rag/retrieval.py— The most common raise site. This method tokenizes the query, strips stopwords from_STOPWORDS, applies suffix stemming via_STEM_SUFFIXES, and scores eachRetrievalEntryusing path, summary, content, and related-field weights. Failures here are usually caused by a malformed corpus or a query that produces no usable tokens.RetrieverProtocolinsrc/attune_rag/retrieval.py— A structural protocol, not a concrete class. If your retriever passes a runtimeisinstancecheck but still raises, verify that itsretrieve()return type isIterable[RetrievalHit]and that eachRetrievalHitcarries a validRetrievalEntry, afloatscore, and a non-emptymatch_reasonstring.RetrievalHitconstruction insrc/attune_rag/retrieval.py— A dataclass with three required fields:entry: RetrievalEntry,score: float, andmatch_reason: str. Omitting any field or passing the wrong type raises aTypeErrorat construction time.
How to diagnose
-
Check whether the query survives stopword filtering.
KeywordRetrieverremoves every token found in_STOPWORDS(articles, modals, pronouns, and common prepositions such asa,the,how,do,is,for). If your entire query consists of stopwords,retrieve()returns an empty list rather than raising. Print the tokenized, filtered query before callingretrieve()to confirm at least one content token remains. -
Verify the corpus satisfies
CorpusProtocol. A corpus object that is missing expected attributes or iteration behavior causesAttributeErrororTypeErrorinsideKeywordRetriever.retrieve(). Confirm your corpus exposes the interface thatCorpusProtocolrequires before passing it to the retriever. -
Confirm
kis a positive integer.KeywordRetriever.retrieve()defaults tok=3. Passingk=0or a negative value may return an empty list or raise depending on how the scorer slices results. Pass an explicit, positivekto rule this out. -
Inspect
RetrievalHit.scorevalues when results are unexpectedly ranked.KeywordRetrieverweights token overlap across thepath,summary,content, andrelatedfields of eachRetrievalEntry. A hit with ascoreof0.0means no stemmed query token matched any weighted field — check that theRetrievalEntryfields are populated and that stemming via_STEM_SUFFIXES(-ing,-ed,-tion,-er, and others) would produce a shared root with the query tokens. -
Trace a
TypeErrorback toRetrievalHitconstruction. If the traceback points insideretrieval.pyat a dataclass instantiation, one of the three fields (entry,score,match_reason) is missing or the wrong type. Confirm the value passed asscoreis afloatandmatch_reasonis a non-empty string.
Source files
src/attune_rag/retrieval.py
Tags: retrieval, keyword, scoring, ranking