Skip to main content
An interview is a real-time voice conversation between a calibrated AI interviewer and one participant. It runs in the browser or over a phone call, lasts 10–45 minutes depending on the template, and ships a scored transcript within 2 minutes of ending.

What makes it an interview, not a chatbot

Adaptive follow-up

Probes vague answers, asks for real examples, and adjusts depth based on what the participant just said. No fixed question list.

Voice-first, sub-second

Real voice latency (<800ms) means the AI feels like a person on a phone call, not a chatbot streaming tokens.

Calibrated scoring

Every dimension you defined on the template gets graded against your rubric — same bar every session, every region.

Authenticity-aware

Tab-switch, paste, voice-spoofing signals captured continuously and surfaced for human judgment.

Channels

ChannelWhen to useLatency
Web (browser)Default. Candidate clicks a join link, AI joins immediately. Camera optional.<800ms
Outbound phone (SIP)High-volume call-center hiring, or when participants can’t install/permission a browser.<1s
Embedded widgetInside your own product or job page — see Embed guide.<800ms
The session record is identical across channels — same scorecard format, same transcript shape, same webhook payload. Pick the channel that matches where your participants are.

Languages

intervyo.ai conducts interviews in English, Japanese, Hindi, Spanish, and several more, with rubric-aligned scoring per language. Set the language on the evaluation template (language field) or per-session by passing it to the create-session endpoint.
Scores are directly comparable across languages — the AI uses the same rubric dimensions and weighting regardless of which language the interview ran in.

Lifecycle in one diagram

  ┌──────────────┐   created    ┌──────────────┐   joined     ┌──────────────┐
  │   scheduled  │ ───────────► │  in_progress │ ───────────► │   completed  │
  └──────────────┘              └──────────────┘              └──────────────┘
         │                              │
         │ candidate doesn't show       │ candidate hangs up early /
         ▼                              ▼ tech failure
  ┌──────────────┐              ┌──────────────┐
  │   cancelled  │              │    failed    │
  └──────────────┘              └──────────────┘
See Sessions for what’s emitted at each transition and what fields are populated.

What gets captured

Every interview produces a structured session record with these fields:
transcript
string[]
Time-stamped speaker turns. AI and participant labeled separately.
recording_url
string
Presigned URL to the audio recording. Short-lived (1 hour by default); request a fresh URL each time you need access.
score
number
Overall rubric score 0–10. Weighted average of per-dimension scores.
evaluation_breakdown
object[]
Per-dimension scores with reasoning paragraphs and transcript citations. See Rubrics for the shape.
authenticity_signals
object
Tab-switch count, paste events, voice-spoofing flags, screen-share detections. Surfaced verbatim — no auto-pass-fail interpretation.
ai_feedback
string
Two-paragraph plain-English summary: what the participant did well, what the next-round interviewer should probe. Generated against your rubric.

Re-running an interview

Interviews are immutable once completed or failed. To run another one for the same participant — say after a technical glitch — create a new session. The participant’s prior sessions stay attached for history.
Multi-stage templates auto-create a fresh session for each stage. You don’t need to manually re-create sessions when a candidate progresses through stages — see Templates.
Last modified on June 3, 2026