BookbagBookbag
← Back to Resources
Product

Blocked vs Needs Fix vs Safe: The Three-Lane Verdict System

6 min readLast updated: March 2026
Not all AI-generated messages are created equal. Some are perfect. Some need minor edits. Some should never ship. The three-lane verdict system routes each message to the right workflow based on risk.

The Three Lanes

safe_to_deploy

Message passes all rubric checks. No issues detected. Approved and logged with full context.

Typical rate: 90-95% of messages
What you get back:
  • • Rubric scores (all passing)
  • • Full audit trail (approver, role, timestamp, taxonomy version)
  • • The approved message as a positive training example

needs_fix

Message has minor issues. QA reviewer can edit and approve. Corrections become training data.

Typical rate: 4-8% of messages
What you get back:
  • • Failure categories + primary failure reason + severity + business impact
  • • Rubric scores showing where it fell short
  • • Gold-standard rewrite + explanation of what changed
  • • SFT pair (original → rewrite) and DPO pair for training export
  • • Policy flags if applicable

blocked

High-risk message. SME provides final approval with rationale and evidence. Full audit trail.

Typical rate: 1-3% of messages
What you get back:
  • • Failure categories + severity: critical
  • • SME rationale, evidence citations, confidence assessment, final decision
  • • Full audit trail with escalation handling
  • • Policy flags with specific violation types + reviewer notes

How Routing Works

Each message is evaluated against your defined rubrics. Based on the severity of any detected issues, the system routes to the appropriate lane.

Safe to Deploy

These messages pass all rubric checks. They're approved and cleared for delivery.

Every approved message is logged with full context: timestamp, rubric version, reviewer, and evaluation results. If regulators ask "Who approved this?" you have the answer — with a complete audit trail.

Needs Fix

These messages have minor issues that a QA reviewer can quickly fix. Examples:

  • Slightly off-brand tone
  • Low-effort personalization ("I saw your LinkedIn post")
  • Missing or weak call-to-action
  • Formatting issues

QA can edit the message, approve it, and send. The original (rejected) and corrected (approved) versions are logged as a preference pair for training.

Blocked

These are high-risk messages that require SME review. Examples:

  • Prohibited promissory language ("guaranteed returns")
  • Potential hallucinations (unverifiable claims)
  • Missing required disclosures
  • High-risk language for regulated industries

SMEs must provide rationale and evidence for their decision. This creates the audit trail compliance teams need.

Why Three Lanes Instead of Two?

Many systems use a binary pass/fail model. Bookbag uses three lanes because most issues aren't black-and-white.

A message with weak personalization isn't a compliance risk—it just needs better copy. Routing this to the SME lane wastes their time and creates a bottleneck.

The three-lane model allows you to scale supervision: approve clean messages quickly, fix the fixable issues efficiently, and reserve SME attention for genuine high-risk decisions.

Every Verdict Is a Structured Data Package

A verdict isn't just a label — it's a structured package of data that tells you exactly what happened, why, and what to do about it. Here's what a typical needs_fix verdict looks like:

production_verdict: needs_fix failure_categories: [over-promising, personalization_failure] primary_failure_reason: over-promising severity: high business_impact: revenue_loss RUBRIC SCORES correctness: 3/5 tone: 4/5 personalization: 2/5 policy_compliance: 2/5 confidence: high GOLD STANDARD RESPONSE "Hey Jordan — if you're open to it, I can share examples of how teams improve reply rates without making performance guarantees." EXPLANATION Removed guarantee language (§4.2 violation). Replaced with permission-based soft CTA. AUDIT TRAIL approver: reviewer_4821 (QA) approved_at: 2024-01-15T10:23:41Z taxonomy_v: v2.3.1 TRAINING ARTIFACT type: SFT pair status: approved POLICY FLAGS [claims] "Guarantee language violates §4.2"

Key Takeaways

  • 1.Three lanes = safe_to_deploy (90%+), needs_fix (4-8%), blocked (1-3%)
  • 2.Safe messages are approved with full audit logs
  • 3.Needs fix routes to QA for quick edits
  • 4.Blocked requires SME approval with rationale
  • 5.This model scales supervision without bottlenecks

Ready to evaluate your AI?

Join the teams shipping safer AI with real-time evaluation, audit trails, and continuous improvement.