Work with a corpus
Use a corpus implementation when you need to load and query a collection of retrieval entries — either from a directory of Markdown files (DirectoryCorpus), from the bundled attune-help templates (AttuneHelpCorpus), or from a custom source that satisfies CorpusProtocol.
Prerequisites
- Read access to the project source under
src/attune_rag/corpus/ - A working Python environment with the package installed
Choose a corpus implementation
The three concrete options map to distinct use cases:
| Class | Use when |
|---|---|
DirectoryCorpus |
You have a local directory of Markdown files to load as a corpus |
AttuneHelpCorpus |
You want to query the bundled attune-help templates directly |
Custom CorpusProtocol |
You need a corpus backed by a source other than the filesystem |
Load a DirectoryCorpus
-
Import the class from the public API:
from corpus import DirectoryCorpus -
Instantiate it with the path to your Markdown directory:
corpus = DirectoryCorpus(root=Path("docs/"))Pass optional arguments to refine loading behavior:
summaries_file— path to a file that provides per-entry summariescross_links_file— path to a file that provides cross-link relationshipsextra_aliasesorextra_aliases_file— additional aliases beyond what templates declareglob— override the default**/*.mdpattern to match a different set of filescache=False— disable caching if you need entries to reflect live file changeswarn_alias_overlap=False— suppress warnings when aliases collide across templates
-
Iterate over entries to access all loaded templates:
for entry in corpus.entries(): print(entry.path, entry.category) -
Retrieve a single entry by path using
get:entry = corpus.get("tasks/my-template.md")getreturnsNoneif no entry matches the path. -
Look up entries and aliases by index for fast access:
entry = corpus.path_index["tasks/my-template.md"] alias_info = corpus.alias_index["my-alias"]
Load an AttuneHelpCorpus
-
Import and instantiate using the class method:
from corpus import AttuneHelpCorpus corpus = AttuneHelpCorpus.from_attune_help() -
Query entries the same way as
DirectoryCorpus— both implementCorpusProtocol:for entry in corpus.entries(): print(entry.path) entry = corpus.get("concepts/some-concept.md")
Implement a custom CorpusProtocol
-
Import the protocol:
from corpus import CorpusProtocol, RetrievalEntry -
Define a class that implements the four required members:
class MyCorpus: def entries(self) -> Iterable[RetrievalEntry]: ... def get(self, path: str) -> RetrievalEntry | None: ... @property def name(self) -> str: ... @property def version(self) -> str: ...Your class does not need to subclass
CorpusProtocol— any object that satisfies the interface is accepted wherever aCorpusProtocolis expected. -
Construct
RetrievalEntryobjects for each item your corpus exposes. The required fields arepath,category, andcontent:entry = RetrievalEntry( path="custom/my-entry.md", category="tasks", content="# My entry\n\nContent here.", summary="A short description", aliases=("my-entry", "entry-alias"), )
Handle DuplicateAliasError
When two templates in the same corpus declare the same alias, DirectoryCorpus raises DuplicateAliasError (unless warn_alias_overlap=False is set, in which case it warns instead). Catch it if you load corpora programmatically:
from corpus import DirectoryCorpus, DuplicateAliasError
try:
corpus = DirectoryCorpus(root=Path("docs/"))
except DuplicateAliasError as e:
print(f"Alias '{e.alias}' is claimed by both {e.first_path} and {e.second_path}")
Verify the result
Run the corpus test suite to confirm everything loads correctly:
pytest -k "corpus"
A passing run confirms that all entries load without alias conflicts, get resolves paths correctly, and entries() returns the expected RetrievalEntry objects. You can also do a quick sanity check in a Python session:
corpus = DirectoryCorpus(root=Path("docs/"))
assert corpus.name # non-empty string
assert any(True for _ in corpus.entries()) # at least one entry loaded
print(f"Loaded corpus '{corpus.name}' at version {corpus.version}")
Unresolved references
Auto-generated by attune-author fact-check. Review and either fix the source code, fix this doc, or add an override.
| Location | Severity | Issue |
|---|---|---|
| Line 138 (code fence) | error | from corpus import … — module not importable |