Eval
Faithfulness evaluation — FaithfulnessJudge uses Claude tool-use to score whether answer claims are supported by retrieved passages; benchmark prompts for golden-set testing
Faithfulness evaluation — FaithfulnessJudge uses Claude tool-use to score whether answer claims are supported by retrieved passages; benchmark prompts for golden-set testing