Troubleshoot corpus

Before you start

The corpus module has three main moving parts:

Identify which implementation is involved before you start diagnosing.

Symptom table

If you observe Check
DuplicateAliasError at load time Two templates declare the same alias. The error exposes alias, first_path, and second_path — open both files and remove or rename the duplicate alias.
get(path) returns None unexpectedly Confirm the path matches a key in DirectoryCorpus.path_index. Keys are relative paths; a leading / or wrong separator will cause a miss.
entries() returns an empty iterable For DirectoryCorpus, verify root exists and contains files matching the glob (default **/*.md). Print list(corpus.entries()) to confirm.
AttuneHelpCorpus raises on construction Check that the HelpCorpusAdapter passed to __init__ is valid; prefer AttuneHelpCorpus.from_attune_help() to let the class build its own adapter.
version changes between runs unexpectedly DirectoryCorpus.version is a SHA-256 fingerprint of the loaded content. A change means the files on disk changed — check for unintended writes or a stale working directory.
Aliases not resolving Inspect DirectoryCorpus.alias_index. Each key is an alias string; its value is an AliasInfo object pointing back to the source template. A missing entry means the template's frontmatter alias was not parsed.
Slow first load DirectoryCorpus walks the directory and hashes content on first access. Pass cache=True (the default) and confirm you are not instantiating a new DirectoryCorpus on every request.

Step-by-step diagnosis

  1. Reproduce the failure in isolation. Reduce the call to its minimum required arguments. For DirectoryCorpus, that is just root:

    from pathlib import Path
    from attune_rag.corpus import DirectoryCorpus
    
    corpus = DirectoryCorpus(root=Path("path/to/templates"))
    print(list(corpus.entries()))
    print(corpus.version)
    

    For AttuneHelpCorpus:

    from attune_rag.corpus import AttuneHelpCorpus
    
    corpus = AttuneHelpCorpus.from_attune_help()
    print(corpus.name, corpus.version)
    

    Confirm the failure occurs before adding complexity back.

  2. Inspect the indexes. If get() or alias resolution is misbehaving, print the internal indexes before assuming the content is wrong:

    # DirectoryCorpus only
    print(corpus.path_index.keys())   # all loaded relative paths
    print(corpus.alias_index.keys())  # all declared aliases
    

    A missing key here means the file was not loaded or its frontmatter was not parsed correctly.

  3. Check for duplicate aliases. If you see DuplicateAliasError, the exception message includes the alias string, first_path, and second_path. Open both files and deduplicate:

    DuplicateAliasError: alias='foo', first_path='a/one.md', second_path='b/two.md'
    

    Remove or rename the alias in one of the two templates.

  4. Verify the glob pattern. DirectoryCorpus defaults to **/*.md. If your templates use a different extension or live in an unexpected subdirectory, override the glob:

    corpus = DirectoryCorpus(root=Path("templates"), glob="**/*.markdown")
    

    Run list(Path("templates").glob("**/*.md")) directly to confirm which files Python finds.

  5. Run the corpus tests. Before modifying any code, run the existing test suite to establish a baseline:

    pytest -k "corpus" -v
    

    A failing test that exercises your exact path gives you a reproducible fixture to work against.

Common fixes

Source files

Tags: corpus, loader, markdown, attune-help