BookbagBookbag
Model & Prompt Evaluation

AI Confidence Calibration Test

AI models report confidence scores, but are they trustworthy? Learn to spot high-confidence wrong answers, low-confidence correct answers, and dangerous calibration gaps.

Each scenario shows an AI output along with its reported confidence score. Your job is to evaluate whether the confidence score matches the actual quality of the output. Watch for overconfident errors, underconfident correct answers, and cases where the confidence score should change your workflow.

10 questionsAdvanced~8 min