Benchmark errors

Common error signatures

These errors occur when the benchmark runner fails to complete a retrieval or faithfulness evaluation. Common failure points include:

Where errors originate

All errors originate in main() (src/attune_rag/benchmark.py). Because main() is the sole entry point, the exit code and any raised exception come directly from this function.

How to diagnose

  1. Check the exit code first. main() returns 0 on success. A non-zero exit code in CI means a threshold was not met — check your configured precision, recall, or faithfulness thresholds against the reported scores in the output.

  2. Read the full traceback. If the process raises an exception rather than returning an exit code, the traceback names the exception type and the line in benchmark.py where it was raised. An OSError points to a file access problem (query file, output path); a ValueError points to a configuration or input validation problem.

  3. Isolate faithfulness scoring. If the failure only occurs with --with-faithfulness, re-run without that flag. If the run succeeds, the problem is specific to the faithfulness scoring path rather than retrieval evaluation.

  4. Enable DEBUG logging. If the exception message alone is not enough, re-run with logging set to DEBUG. Log output emitted just before the failure typically identifies the query, threshold value, or file path that caused the error.

Source files

Tags: benchmark, ci, precision, recall, quality