Eval

Faithfulness evaluation — FaithfulnessJudge uses Claude tool-use to score whether answer claims are supported by retrieved passages; benchmark prompts for golden-set testing

comparisonconcepterrorfaqnotequickstartreferencetasktiptroubleshootingwarning
comparisonconcepterrorfaqnotequickstartreferencetasktiptroubleshootingwarning