Quickstart: retrieval
KeywordRetriever scores and ranks entries in a corpus against a query using token-overlap, stemming, and stopword filtering. The snippet below runs a retrieval and prints the top result.
from attune_rag.retrieval import KeywordRetriever
retriever = KeywordRetriever()
hits = retriever.retrieve(query="configure logging", corpus=my_corpus, k=3)
for hit in hits:
print(hit.score, hit.match_reason, hit.entry)
Expected output (values depend on your corpus):
0.87 keyword overlap: log, configur <RetrievalEntry path='docs/logging.md'>
0.61 keyword overlap: configur <RetrievalEntry path='docs/setup.md'>
0.44 keyword overlap: log <RetrievalEntry path='docs/debug.md'>
Prerequisites
- The project is cloned and installed locally.
- You have a corpus object that implements
CorpusProtocol.
Steps
-
Create a retriever. Instantiate
KeywordRetriever— no required arguments.from attune_rag.retrieval import KeywordRetriever retriever = KeywordRetriever() -
Run a query. Call
retrieve(query, corpus, k). Setkto the number of results you want (default3). Common stopwords such as"the","how", and"is"are filtered automatically; the retriever also stems tokens before scoring.hits = retriever.retrieve(query="configure logging", corpus=my_corpus, k=3) -
Inspect the hits. Each returned
RetrievalHitexposes three fields:entry(the matched corpus entry),score(float), andmatch_reason(a human-readable explanation of why the entry ranked).for hit in hits: print(hit.score, hit.match_reason, hit.entry)
Source files
src/attune_rag/retrieval.py
Next: Swap KeywordRetriever for your own retriever by implementing the RetrieverProtocol — any class with a retrieve(query, corpus, k) method that returns an iterable of RetrievalHit objects will work as a drop-in replacement.
Tags: retrieval, keyword, scoring, ranking