Tip: Gate CI on faithfulness scores, not just precision and recall

Run main() with --with-faithfulness to include faithfulness scoring alongside retrieval metrics. Without it, a pipeline can pass precision/recall thresholds while still returning responses that contradict the retrieved documents.

Why it sticks: precision and recall measure what you retrieve; faithfulness measures whether the answer reflects it — they catch different failure modes.

Tradeoff: Faithfulness scoring adds latency and may require a separate model call. Keep it enabled in CI but consider disabling it in fast local iteration loops where retrieval quality is your only concern.

Details

Source files

Tags: benchmark, ci, precision, recall, quality