Tip: Plug in a custom retriever without subclassing KeywordRetriever

Implement RetrieverProtocol — a single retrieve(query, corpus, k) method — instead of extending KeywordRetriever when you need different scoring logic.

Why: KeywordRetriever bundles stopword filtering, suffix stemming, and four field weights (path, summary, content, related) into one concrete class. Subclassing it means inheriting all of that behavior and working around what you don't want. A fresh RetrieverProtocol implementation gives you a clean slate with the same interface.

Tradeoff: You lose KeywordRetriever's token-overlap scoring for free. If you only need to adjust field weights, check whether KeywordRetriever already accepts weight configuration before writing a new retriever from scratch.

Source files

Tags: retrieval, keyword, scoring, ranking