Comparison: Corpus implementations

Context

The corpus feature provides a pluggable loading layer built around a single interface, CorpusProtocol. Two concrete implementations ship with the library: DirectoryCorpus, which loads arbitrary markdown files from disk, and AttuneHelpCorpus, which wraps the bundled attune-help templates. Choosing between them comes down to where your content lives and how much control you need over the loading pipeline.

Feature comparison

Capability DirectoryCorpus AttuneHelpCorpus
Content source Any directory of .md files (configurable glob) Bundled attune-help templates only
Entry point DirectoryCorpus(root, ...) AttuneHelpCorpus.from_attune_help() (class method)
Summaries Optional external summaries_file or extra_summaries dict Managed by the bundled adapter
Cross-links Optional external cross_links_file Managed by the bundled adapter
Alias indexing Yes — alias_index property; raises DuplicateAliasError on collision Inherited via CorpusProtocol; index managed by adapter
Path index Yes — path_index property (rel_path → RetrievalEntry) get(path) lookup only
Corpus version Stable SHA-256 fingerprint of loaded content Delegated to HelpCorpusAdapter
Caching Configurable (cache=True by default) Handled by adapter
Custom glob Yes — override DEFAULT_GLOB (**/*.md) No
Typical setup complexity Medium — requires a root path and optional sidecar files Low — single class-method call

Both implementations satisfy CorpusProtocol (entries(), get(), name, version), so any code that types against the protocol works with either.

Tradeoffs

DirectoryCorpus gives you control at the cost of configuration. You provide the root path and, optionally, sidecar files for summaries and cross-links. In return you get a writable alias index, a full path index for O(1) lookups, SHA-256 content fingerprinting for cache invalidation, and the ability to target any subset of files via a custom glob pattern. If two templates declare the same alias, DuplicateAliasError surfaces the conflict immediately (with both paths), so integrity problems do not silently corrupt retrieval.

AttuneHelpCorpus is intentionally narrow. It is a thin adapter over the bundled attune-help templates — you cannot point it at a different directory, and its indexing behaviour is delegated to HelpCorpusAdapter. The upside is zero configuration: AttuneHelpCorpus.from_attune_help() is a single call with no required arguments.

When NOT to use these implementations directly

Use X when…

Use AttuneHelpCorpus when you need to retrieve from the bundled attune-help templates and have no reason to customise the corpus. It is the right default for code that works exclusively with attune-help content.

Use DirectoryCorpus when:

DirectoryCorpus is the more capable implementation for the majority of real-world use cases. AttuneHelpCorpus wins only when your content is exclusively the bundled attune-help templates and you want the simplest possible setup.

Source files

Tags: corpus, loader, markdown, attune-help