Quickstart: retrieval

KeywordRetriever scores and ranks entries in a corpus against a query using token-overlap, stemming, and stopword filtering. The snippet below runs a retrieval and prints the top result.

from attune_rag.retrieval import KeywordRetriever

retriever = KeywordRetriever()
hits = retriever.retrieve(query="configure logging", corpus=my_corpus, k=3)

for hit in hits:
    print(hit.score, hit.match_reason, hit.entry)

Expected output (values depend on your corpus):

0.87  keyword overlap: log, configur  <RetrievalEntry path='docs/logging.md'>
0.61  keyword overlap: configur       <RetrievalEntry path='docs/setup.md'>
0.44  keyword overlap: log            <RetrievalEntry path='docs/debug.md'>

Prerequisites

Steps

  1. Create a retriever. Instantiate KeywordRetriever — no required arguments.

    from attune_rag.retrieval import KeywordRetriever
    retriever = KeywordRetriever()
    
  2. Run a query. Call retrieve(query, corpus, k). Set k to the number of results you want (default 3). Common stopwords such as "the", "how", and "is" are filtered automatically; the retriever also stems tokens before scoring.

    hits = retriever.retrieve(query="configure logging", corpus=my_corpus, k=3)
    
  3. Inspect the hits. Each returned RetrievalHit exposes three fields: entry (the matched corpus entry), score (float), and match_reason (a human-readable explanation of why the entry ranked).

    for hit in hits:
        print(hit.score, hit.match_reason, hit.entry)
    

Source files


Next: Swap KeywordRetriever for your own retriever by implementing the RetrieverProtocol — any class with a retrieve(query, corpus, k) method that returns an iterable of RetrievalHit objects will work as a drop-in replacement.

Tags: retrieval, keyword, scoring, ranking