Question 1

What is a gold set?

Accepted Answer

A curated collection of pre-labeled examples with known correct answers, reviewed and approved by SMEs. New reviewers evaluate gold set items to verify they apply rubrics correctly before handling production work. Think of it as the final exam before a reviewer goes live.

Question 2

How is calibration measured?

Accepted Answer

Three metrics: gold set agreement (does the reviewer match the known correct answers?), inter-annotator agreement (do different reviewers give the same verdict on the same items?), and QA sampling rates (how often do quality checks catch errors?). Together, these tell you whether your verdicts are reliable.

Question 3

How often should reviewers be recalibrated?

Accepted Answer

Initial calibration before they touch production items. Ongoing monitoring through quality sampling. Formal recalibration sessions when rubrics change, when agreement metrics drop, or when new failure patterns emerge. Calibration isn't a one-time event — it's continuous quality assurance on the quality assurance.

Annotator Calibration

What It Means

Why It Matters

How Bookbag Helps

Gold set management

Automatic quality sampling

Agreement tracking dashboard

Related Terms

Frequently Asked Questions

Related Resources

Solutions

Compare

See how Bookbag works