Retrieval reference
Score and rank corpus entries against a query using KeywordRetriever, a token-overlap retriever that applies stemming, stopword filtering, and per-field weights across path, summary, content, and related fields.
Classes
| Class |
Description |
RetrievalHit |
A single retrieval result holding the matched entry, its score, and the match reason. |
RetrieverProtocol |
Any object with a retrieve(query, corpus, k) method. |
KeywordRetriever |
Token-overlap retriever with path / summary / content / related weights. |
RetrievalHit
RetrievalHit is a dataclass representing one item returned by a retriever.
Fields
| Field |
Type |
Default |
entry |
RetrievalEntry |
|
score |
float |
|
match_reason |
str |
|
RetrieverProtocol
Structural protocol — any class that implements retrieve(query, corpus, k) satisfies it.
Methods
| Method |
Parameters |
Returns |
Description |
retrieve |
query: str, corpus: CorpusProtocol, k: int = 3 |
Iterable[RetrievalHit] |
Retrieve the top-k hits for query from corpus. |
KeywordRetriever
Token-overlap retriever that scores entries by stemmed keyword matches, weighted by field (path, summary, content, related). Entries below min_score are excluded when a threshold is set.
Constructor
| Parameters |
Type |
Default |
Description |
min_score |
float | None |
None |
Minimum score threshold; hits below this value are excluded from results. |
Methods
| Method |
Parameters |
Returns |
Description |
retrieve |
query: str, corpus: CorpusProtocol, k: int = 3 |
list[RetrievalHit] |
Return the top-k scored and ranked RetrievalHit objects for query. |
Constants
The following module-level constants control tokenization behavior during scoring.
Stopwords
| Constant |
Type |
Members |
_STOPWORDS |
frozenset |
'a', 'an', 'the', 'how', 'do', 'does', 'i', 'to', 'with', 'for', 'is', 'are', 'of', 'in', 'on', 'at', 'and', 'or', 'but', 'can', 'should', 'would', 'will', 'be', 'been', 'by', 'my', 'me', 'we', 'it', 'this', 'that', 'these', 'those' |
Stem suffixes
| Constant |
Type |
Members (in order) |
_STEM_SUFFIXES |
tuple |
'ations', 'ation', 'ators', 'ator', 'ates', 'ate', 'ings', 'ing', 'ions', 'ion', 'ities', 'ity', 'ies', 'ers', 'ed', 'er', 'es', 's' |
Source files
src/attune_rag/retrieval.py
Tags
retrieval, keyword, scoring, ranking