Skip to main content
A rubric is the structured scoring criteria attached to an evaluation template. It’s how intervyo.ai turns a free-form voice interview into a defensible scorecard. Rubrics are versioned, calibrated by default, and applied identically to every participant — same dimensions, same weights, same bar.

Anatomy of a rubric

{
  "dimensions": [
    {
      "id": "problem_decomposition",
      "label": "Problem decomposition",
      "weight": 0.30,
      "passThreshold": 6.5,
      "description": "Can the candidate break ambiguity into concrete sub-problems and pick a place to start?"
    },
    {
      "id": "code_quality",
      "label": "Code quality & correctness",
      "weight": 0.30,
      "passThreshold": 7.0
    },
    {
      "id": "tradeoff_thinking",
      "label": "Trade-off thinking",
      "weight": 0.25,
      "passThreshold": 6.0
    },
    {
      "id": "communication",
      "label": "Communication under pressure",
      "weight": 0.15,
      "passThreshold": 6.0
    }
  ],
  "overallPassThreshold": 7.0
}
dimensions[].weight
number
Fractional weight in the overall score. Weights across all dimensions should sum to 1.0. The platform normalizes if they don’t, but explicit is better.
dimensions[].passThreshold
number
Minimum score on this dimension for a pass. A candidate that scores below the threshold on any single dimension fails the rubric — even if their overall is above overallPassThreshold.
overallPassThreshold
number
Minimum weighted-average score for a pass. Defaults to 7.0 if unset.

How scoring runs

For each dimension, the AI:
1

Reads the transcript

Identifies the segments relevant to this dimension — code-quality evidence in the coding round, discovery-question evidence in a sales round.
2

Grades against the dimension's description

Produces a 0–10 score with a short reasoning paragraph citing the specific transcript lines that drove the score.
3

Joins into the breakdown

All per-dimension scores are bundled into evaluation_breakdown with transcript citations and reasoning per dimension.
4

Computes overall + pass

Weighted average → overall score. Compare against thresholds → emit passed: true|false.

Why dimensions matter more than overall

A single overall score throws away the signal you need to actually decide. Two candidates can both score 7.5 overall and be wildly different hires:
CandidateCommunicationTechnicalTrade-offsAuthenticity
Alex9.06.57.59.0
Brin6.09.07.59.0
Both score 7.5. Alex is a strong PM-leaning IC; Brin is a strong heads-down engineer. The dimension breakdown is the hiring signal, not the average.
Show your hiring panel the per-dimension breakdown, not the overall. Decisions converge faster when everyone’s looking at the same evidence per dimension, with the transcript citations one click away.

Templates ship with calibrated rubrics

You don’t have to design rubrics from scratch. intervyo.ai ships calibrated starters for the most-hired roles:

Software engineers

Problem decomposition · Code quality · Trade-off thinking · Communication

Sales reps

Discovery quality · Pitch adaptation · Objection handling · Closing

Customer support

Diagnosis · Written communication · Technical fluency · Escalation judgement

Product managers

Prioritization · Customer empathy · Cross-functional language · Metric literacy
See the full list of role templates. You can use them as-is, or fork them — every starter is a calibrated baseline, not a contract.

Calibration analytics

Rubrics are useful only if humans use them consistently. intervyo.ai surfaces:
  • Recruiter-to-recruiter variance per dimension
  • Hiring-manager parity across panels reviewing the same candidates
  • Drift alerts when a recruiter’s scoring trends away from the team baseline by more than a threshold
  • Calibration sessions — built-in quarterly workflow where multiple scorers grade the same candidate and the platform surfaces divergence
Read more under Recruiter Calibration on the marketing site.

Versioning

Rubrics on a template are versioned. Edits create a new version; existing sessions keep the rubric they were scored against. That way:
  • Hiring panels reviewing 60-day-old scorecards see the rubric that produced those scores
  • You can tighten a rubric mid-quarter without breaking historical comparability
  • Compliance teams can audit which version of which rubric scored which candidate
Don’t tweak a rubric and run a back-comparison expecting like-for-like numbers. A 7.5 under v3 of the rubric is not directly comparable to a 7.5 under v2 — the dimensions or weights changed.
Last modified on June 2, 2026