Troubleshoot corpus
Before you start
The corpus module has three main moving parts:
CorpusProtocol— the interface that all corpus implementations satisfy. It exposesentries(),get(path),name, andversion.DirectoryCorpus— loads.mdfiles from disk using a glob pattern (default:**/*.md). Builds apath_indexand analias_indexat load time, and optionally caches results.AttuneHelpCorpus— a thin adapter over the bundled attune-help templates. Constructed directly viaAttuneHelpCorpus(adapter)or through thefrom_attune_help()class method.
Identify which implementation is involved before you start diagnosing.
Symptom table
| If you observe | Check |
|---|---|
DuplicateAliasError at load time |
Two templates declare the same alias. The error exposes alias, first_path, and second_path — open both files and remove or rename the duplicate alias. |
get(path) returns None unexpectedly |
Confirm the path matches a key in DirectoryCorpus.path_index. Keys are relative paths; a leading / or wrong separator will cause a miss. |
entries() returns an empty iterable |
For DirectoryCorpus, verify root exists and contains files matching the glob (default **/*.md). Print list(corpus.entries()) to confirm. |
AttuneHelpCorpus raises on construction |
Check that the HelpCorpusAdapter passed to __init__ is valid; prefer AttuneHelpCorpus.from_attune_help() to let the class build its own adapter. |
version changes between runs unexpectedly |
DirectoryCorpus.version is a SHA-256 fingerprint of the loaded content. A change means the files on disk changed — check for unintended writes or a stale working directory. |
| Aliases not resolving | Inspect DirectoryCorpus.alias_index. Each key is an alias string; its value is an AliasInfo object pointing back to the source template. A missing entry means the template's frontmatter alias was not parsed. |
| Slow first load | DirectoryCorpus walks the directory and hashes content on first access. Pass cache=True (the default) and confirm you are not instantiating a new DirectoryCorpus on every request. |
Step-by-step diagnosis
-
Reproduce the failure in isolation. Reduce the call to its minimum required arguments. For
DirectoryCorpus, that is justroot:from pathlib import Path from attune_rag.corpus import DirectoryCorpus corpus = DirectoryCorpus(root=Path("path/to/templates")) print(list(corpus.entries())) print(corpus.version)For
AttuneHelpCorpus:from attune_rag.corpus import AttuneHelpCorpus corpus = AttuneHelpCorpus.from_attune_help() print(corpus.name, corpus.version)Confirm the failure occurs before adding complexity back.
-
Inspect the indexes. If
get()or alias resolution is misbehaving, print the internal indexes before assuming the content is wrong:# DirectoryCorpus only print(corpus.path_index.keys()) # all loaded relative paths print(corpus.alias_index.keys()) # all declared aliasesA missing key here means the file was not loaded or its frontmatter was not parsed correctly.
-
Check for duplicate aliases. If you see
DuplicateAliasError, the exception message includes the alias string,first_path, andsecond_path. Open both files and deduplicate:DuplicateAliasError: alias='foo', first_path='a/one.md', second_path='b/two.md'Remove or rename the alias in one of the two templates.
-
Verify the glob pattern.
DirectoryCorpusdefaults to**/*.md. If your templates use a different extension or live in an unexpected subdirectory, override the glob:corpus = DirectoryCorpus(root=Path("templates"), glob="**/*.markdown")Run
list(Path("templates").glob("**/*.md"))directly to confirm which files Python finds. -
Run the corpus tests. Before modifying any code, run the existing test suite to establish a baseline:
pytest -k "corpus" -vA failing test that exercises your exact path gives you a reproducible fixture to work against.
Common fixes
-
Duplicate alias. Edit the offending template identified by
DuplicateAliasError.first_pathorsecond_pathand remove the conflicting alias from its frontmatter. -
Path mismatch in
get(). Normalize the path you pass toget()to match the relative-path keys inpath_index:entry = corpus.get("how-to/deploy.md") # correct: relative, no leading slash entry = corpus.get("/how-to/deploy.md") # wrong: leading slash causes None -
Empty corpus from wrong root. If
entries()is empty, verify therootargument is the directory that contains the markdown files, not a parent directory:corpus = DirectoryCorpus(root=Path("src/attune_rag/corpus/templates")) -
Stale cached state. If the corpus was cached at module import time and templates have changed on disk, reinstantiate
DirectoryCorpusor restart the process. There is no explicit cache-invalidation API; theversionSHA-256 fingerprint tells you whether disk content has drifted. -
Wrong adapter for
AttuneHelpCorpus. If constructingAttuneHelpCorpusdirectly raises, switch to the factory method, which handles adapter construction internally:corpus = AttuneHelpCorpus.from_attune_help() -
Dependency or environment drift. If the corpus loaded correctly previously but no longer does, run:
pip show attune-ragand confirm the installed version matches your expectations. A version upgrade may have changed the bundled template paths that
AttuneHelpCorpusrelies on.
Source files
src/attune_rag/corpus/__init__.pysrc/attune_rag/corpus/base.pysrc/attune_rag/corpus/directory.pysrc/attune_rag/corpus/attune_help.py
Tags: corpus, loader, markdown, attune-help