Skip to main content

ADR-002: Simli as avatar provider

Status: Accepted Date: 2026-06-10

Context

The AI interviewer needs a visible presence in the interview room. The platform runs a tiered avatar strategy: the avatar a candidate sees depends on the customer's subscription, while the interview machinery (questions, evaluation, transcript, feedback) is identical across tiers.

TierAvatar
FreeBrowser-rendered 3D avatar (Three.js + Ready Player Me model)
Pro / CareerSimli — photo-real face, simli-client SDK over WebRTC
Fallback (all tiers)Animated CSS bot — safe default while loading, and the instant kill switch

Constraints that shaped the provider decision:

  • Vendor API keys must never reach the browser. Avatar sessions are minted server-side: the video-streaming service exchanges the server-held provider key for a short-lived session token, and only that token goes to the client.
  • The avatar layer fails in cascades if not isolated. An observed production failure chain — avatar provider timeout → avatar-model CDN fetch failure → WebGL context loss — drove a hard rule that one layer's failure must not cascade. The avatar provider sits behind a tight timeout and degrades to the fallback avatar rather than breaking the interview.
  • The avatar likeness is licensed media. What gets baked into stored recordings is a compliance decision, not an aesthetic one (see "Recording interplay" below).
  • A second provider exists in the codebase. Spatius (a "Motion Server" provider with an edge "Direct Mode" SDK session) is supported behind the same server-side token-mint pattern and a provider flag, but Simli is the live provider.

Decision

Retain Simli as the photo-real avatar provider for paid tiers, with sessions minted server-side (including a Simli session pool/queue for capacity), behind a swappable routing seam.

Alternatives considered

  • Spatius — supported as a mintable provider (edge Direct Mode, SDK-driven session) behind a provider flag, but not promoted to the live provider. The originally stated comparison — that Spatius renders on-device and is therefore incompatible with server-side recording — is not documented in the current sources, which describe Spatius only as an "edge direct-mode" session-token mint. Note also that the platform's server-side recording design deliberately excludes the avatar from recordings (see below), so egress recordability of the avatar was not the deciding criterion in the sources reviewed.
  • Hedra — the planned upgrade for the Career tier; not adoptable yet because its session API was not available at decision time. The router seam exists specifically so this swap touches one component.
  • Browser-rendered 3D for all tiers — rejected for paid tiers: it is not photo-real, which is the paid-tier differentiator. It remains the Free-tier implementation.
  • No avatar (voice only / animated bot) — retained as the fallback and kill switch, not the product experience for paid tiers.

Recording interplay

Simli's avatar video reaches the browser via Simli's own hosted media infrastructure, separate from the platform's LiveKit room. The server-side recording design (recording v2) records the platform's own room — candidate video, candidate audio, and the interviewer's voice — so the avatar is excluded from stored recordings by construction. This exclusion is a deliberate, locked decision: a licensed likeness baked into a stored file is irreversible, adds a consent/likeness surface, and has zero evaluation value. (The earlier client-side recorder did bake the avatar into the file — one of the reasons it is being retired.) Playback re-evokes the interviewer with a nameplate and transcript overlay instead, which is reversible.

Consequences

Easier:

  • Paid tiers get a photo-real interviewer with lip-synced speech, with the provider key held server-side.
  • The avatar is a leaf dependency: if Simli is slow or down, the session times out quickly and degrades to the fallback avatar — the interview itself continues.
  • Stored recordings carry no licensed likeness, which keeps the compliance surface small.

Harder / new obligations:

  • Vendor dependency for the paid-tier experience. Simli availability and latency directly shape the premium product; the pool/queue and timeout budget are ongoing operational surfaces.
  • The session-mint route must exist and stay server-side — a thin, stateless credential proxy that every avatar session depends on.
  • Defensive isolation is mandatory, forever. The cascade failure mode (provider timeout → CDN failure → WebGL loss) means the avatar layer must always be wrapped in timeouts and fallbacks; this is a standing engineering constraint, not a one-time fix.
  • Playback owes the user context. Because the avatar is absent from recordings, the player must supply the interviewer's identity via overlay (nameplate + transcript) — a permanent product obligation.

Reversibility

  • Low lock-in by design. The avatar router resolves the tier and picks the component; swapping providers (e.g., to Hedra for the Career tier) is documented as touching only the router. A provider flag already exists for routing between Simli and Spatius.
  • The token-mint integration is a thin, stateless credential proxy — replacing it for a new provider is a small, contained change, not an architectural one.
  • No stored-data lock-in. Recordings contain no Simli-rendered pixels, so switching providers never strands or taints the recording archive — the strongest reversibility property of this decision.
  • The exit cost is product-side, not technical: validating a new provider's visual quality, lip-sync fidelity, latency, and session capacity against the paid-tier bar.