Note: corpus

Context

The corpus layer provides pluggable loaders for retrieval content. CorpusProtocol defines the interface; DirectoryCorpus loads markdown files from disk; AttuneHelpCorpus wraps the bundled attune-help templates.

Content

Any object that satisfies CorpusProtocol exposes three things: an iterable of RetrievalEntry values via entries(), a lookup by path via get(), and name and version properties that identify the corpus.

The two built-in implementations differ in where they source content:

Each loaded template is represented as a RetrievalEntry dataclass (src/attune_rag/corpus/base.py) with the following fields:

Field Type Notes
path str Corpus-relative path used as the lookup key
category str Template category
content str Full template text
summary str | None Optional short description
related tuple[str, ...] Paths of related entries
aliases tuple[str, ...] Alternative lookup keys
metadata dict[str, Any] Arbitrary extra data

Aliases are indexed globally across the corpus. If two templates declare the same alias string, DirectoryCorpus raises DuplicateAliasError, which carries the alias and both conflicting paths.

Source files

Tags: corpus, loader, markdown, attune-help