Skip to main content

video-streaming-server

Purpose

video-streaming-server (VSS) is a single Node.js (ESM) process that owns the real-time media and session side of an AI-led interview: LiveKit room/token management, session orchestration and liveness, the recording pipeline to S3, neural text-to-speech, streaming speech-to-text, and avatar session minting — with every third-party secret kept server-side. The AI interview intelligence (question planning, evaluation, feedback) lives in the Python bot-backend service; VSS still carries a legacy in-process brain that is being retired (see Legacy AI brain).

Architecture

The frontends (interviews-ui, and the live session room in smart-interview-ui) never talk to OpenAI, Deepgram, Simli, or Spatius directly. They call VSS over three transports:

  • Express 4 HTTP API — LiveKit routes (tokens, recording control, webhooks), voice/avatar routes (TTS, avatar session minting), session-lifecycle routes (heartbeats, resume parsing), and the remaining legacy AI-brain routes while they migrate out.
  • A raw WebSocket proxy — streams the candidate's 16 kHz PCM audio to Deepgram for speech-to-text; the API key stays on the server.
  • Socket.io — legacy WebRTC signaling and the in-app (Puppeteer-based) recording flow.

Actual audio/video media flows browser ↔ LiveKit Cloud (WebRTC); VSS mints the access tokens and controls recording egress. Recordings are produced by LiveKit room-composite egress as MP4 into S3, with a server-side finalize step that copies them into their final location. FFmpeg is part of the deployment environment for the recording pipeline.

Legacy AI brain (being retired)

VSS predates bot-backend and still carries an in-process OpenAI "interview brain" — question generation, answer evaluation, and feedback-report routes mounted alongside the media API. That brain is being migrated away under the TI-340 wave plan: new AI-interview capability — question planning, evaluation, blended technical interviews — lands in bot-backend, the AI interview brain in the current workflow. In this repo the rule is bug fixes and migration-shim work only; before touching brain code, check which TI-340 wave owns the surface — it may already be scheduled for deletion.

Key components

AreaWhat it owns
livekit/token.jsLiveKit JWT issuance — security-critical; coordinate before changing grant logic.
livekit/recording.js, recordingFinalize.js, recordingTimeline.js, egressUtils.js, webHooks.jsThe recording pipeline: egress control, finalize-to-S3, Redis timeline, LiveKit webhook handling.
ai-interviews/services/openaiService.jsThe one centralized OpenAI client (TTS + remaining legacy-brain calls). All OpenAI calls go through it; per-surface model overrides resolve here.
ai-interviews/routes/{question,evaluation,feedback}Routes.jsThe legacy AI brain — question generation, answer scoring, feedback reports. Being retired under TI-340; net-new logic for these surfaces lands in bot-backend.
ai-interviews/routes/ttsRoutes.jsNeural OpenAI TTS → MP3 (8 s timeout).
ai-interviews/routes/sttStreamRoutes.jsDeepgram Nova-3 STT WebSocket proxy (key stays server-side; 10-minute hard cap per stream).
ai-interviews/routes/simliRoutes.js, spatiusRoutes.jsAvatar session-token minting; Simli pool/queue management.
ai-interviews/routes/heartbeatRoutes.js + jobs/zombieSessionCleanup.jsSession liveness heartbeats and abandoned-session cleanup/refund.
ai-interviews/config/env.jsEnv loading and precedence — imported first in index.js.
ai-interviews/middleware/timeout.jsTight timeouts on external calls (Simli mint 5 s, TTS 8 s) — deliberate latency budgets.

Route groups: AI interview routes are mounted under /api/ai-mock-interviews/*, LiveKit routes under /livekit/*, plus /health and a legacy Socket.io/recorder surface. If the LiveKit module fails to load, /livekit/* returns 503 but the rest of the server keeps serving.

Local development

Prerequisites

  • Node.js 20 LTS or newer — ESM project ("type": "module"), no engines pin.
  • PostgreSQL reachable via the DB_* (or RDS_*) env vars — required for session/feedback persistence.
  • Redis — only needed for the LiveKit recording pipeline; the AI interview loop boots without it.
  • API keys per the table below. For a minimal "does it start" boot, only OPENAI_API_KEY is strictly needed — everything else degrades gracefully.

Commands

cp .env.example .env # fill in your own values — NEVER commit .env / .env.local
npm install
npm run dev # nodemon (auto-restart)
npm start # prod-style: node index.js
npm test # vitest run — run once, do not leave watch mode running

There is no build step (npm run build is a no-op). A perf suite exists via npm run test:perf. The server listens on PORT (set it in your .env).

Env file precedence (lowest → highest): .env.env.{development|production}.env.local (gitignored dev override), loaded by ai-interviews/config/env.js before any other import.

Environment variables

Names only — use your own values; never commit real secrets.

VariablePurpose
OPENAI_API_KEYRequired for any AI surface — questions, evaluation, feedback, TTS fail without it.
DB_* / RDS_*PostgreSQL connection for session/feedback persistence.
PORTHTTP listen port, e.g. <PORT>.
NODE_ENVdevelopment / production.
FRONTEND_URLAllowed CORS origin, e.g. https://develop.theinterviews.ai.
LIVEKIT_API_KEY / LIVEKIT_API_SECRET / LIVEKIT_HOSTLiveKit credentials + host (e.g. <LIVEKIT_WS_URL>) — needed to mint room tokens for live interviews.
REDIS_HOSTRedis for recording state/timeline; unset disables that pipeline.
AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAMERecording storage, e.g. <S3_BUCKET>; unset breaks egress finalize.
DEEPGRAM_API_KEY (+ STT_BACKEND=deepgram)Streaming STT; unset → the STT stream endpoint returns 503.
SIMLI_API_KEY + SIMLI_FACE_IDSimli avatar session minting; unset → 503, frontend falls back to the lite orb.
SPATIUS_API_KEY (+ SPATIUS_REGION, SPATIUS_APP_ID)Spatius avatar session minting; unset → 503, frontend falls back to the orb.
INTERNAL_SERVICE_JWT_SECRETAuth for the zombie-cleanup quota-refund call to user-management.
OPENAI_QUESTION_MODEL, OPENAI_FEEDBACK_MODEL, OPENAI_TTS_MODEL, …Per-surface model overrides resolved in openaiService.js; legacy OPENAI_MODEL is a fallback.

The full annotated list lives in the repo's .env.example (placeholders only).

Gotchas

  • One OpenAI client, ever. Don't instantiate OpenAI clients in route files — everything goes through ai-interviews/services/openaiService.js. And don't point the legacy OPENAI_MODEL fallback at a small/cheap model in production: it collapses question diversity across sessions for the same resume.
  • Env loads first. import './ai-interviews/config/env.js' is the first line of index.js. Anything reading process.env at module-init time depends on that ordering.
  • Timeouts are latency budgets, not arbitrary. Simli mint 5 s, TTS 8 s (ai-interviews/middleware/timeout.js). Tune deliberately.
  • livekit/token.js is security-critical. It mints the JWTs that grant room access — coordinate before touching grant logic.
  • Secrets stay server-side. The STT WebSocket proxy and avatar-session routes exist precisely so DEEPGRAM_API_KEY, SIMLI_API_KEY, and SPATIUS_API_KEY never reach the browser or logs.
  • Graceful degradation, not crashes. Missing optional integrations return 503 and the frontend falls back (lite orb, no STT). If LiveKit fails to load, only /livekit/* goes dark.
  • Idempotent liveness. Heartbeat-stop and zombie cleanup treat unknown sessions as no-ops; the client keeps beating across WebRTC reconnects, so a transient drop under 60 s never reaps a live session.
  • Real-time is prod-touching. A live interview can't be rolled back mid-session. Isolate avatar/streaming layers defensively — an observed failure cascade went Simli timeout → CDN fetch failure → WebGL context loss. One layer's failure must not take down the others.
  • Watch the migration waves. Before changing AI "brain" code, check which TI-340 wave owns that surface — it may already be scheduled for deletion in favor of bot-backend.