Benchmark runner quickstart
Run a retrieval benchmark — with optional faithfulness scoring — that gates CI on configurable thresholds.
from attune_rag.benchmark import main
exit_code = main()
print(exit_code) # 0
Prerequisites
- The project is cloned and installed locally.
- You have a query file ready if you want to test against custom queries.
Run your first benchmark
-
Import and call
main. With no arguments, the runner uses its defaults:from attune_rag.benchmark import main exit_code = main()A return value of
0means all metrics passed their thresholds. -
Pass a custom query file. Supply your own queries via
argv:exit_code = main(["--queries", "my_queries.jsonl"]) -
Enable faithfulness scoring. Add
--with-faithfulnessto score generated answers against their source passages:exit_code = main(["--queries", "my_queries.jsonl", "--with-faithfulness"])A non-zero return value means at least one metric fell below its configured threshold — check the logged output to see which one.
Expected output
Retrieval precision: 0.91 ✓
Retrieval recall: 0.87 ✓
Faithfulness: 0.93 ✓
Benchmark passed (exit 0)
Next: Wire main() into your CI pipeline and set threshold values in your benchmark configuration to enforce quality gates on every pull request.
Source files
src/attune_rag/benchmark.py
Tags: benchmark, ci, precision, recall, quality