Note: benchmark
Context
The benchmark module (src/attune_rag/benchmark.py) is a retrieval and optional faithfulness benchmark runner. It is designed to gate CI pipelines on configurable quality thresholds, measuring precision and recall against a query file you supply.
Faithfulness scoring is opt-in and enabled with the --with-faithfulness flag. Without it, the runner evaluates retrieval quality only.
How it works
The module is function-first: there are no classes to instantiate. The entry point is main(), which accepts an argument list and returns 0 on success. You invoke it directly — either from the command line or by passing a list of arguments programmatically.
from attune_rag.benchmark import main
exit_code = main(["--with-faithfulness"])
Configurable thresholds and the query file path are passed as arguments to main(); the runner exits non-zero if any threshold is not met, making it suitable as a CI gate.
Tags: benchmark, ci, precision, recall, quality