video-streaming-server
Purpose
video-streaming-server (VSS) is a single Node.js (ESM) process that owns the real-time media and session side of an AI-led interview: LiveKit room/token management, session orchestration and liveness, the recording pipeline to S3, neural text-to-speech, streaming speech-to-text, and avatar session minting — with every third-party secret kept server-side. The AI interview intelligence (question planning, evaluation, feedback) lives in the Python bot-backend service; VSS still carries a legacy in-process brain that is being retired (see Legacy AI brain).
Architecture
The frontends (interviews-ui, and the live session room in smart-interview-ui) never talk to OpenAI, Deepgram, Simli, or Spatius directly. They call VSS over three transports:
- Express 4 HTTP API — LiveKit routes (tokens, recording control, webhooks), voice/avatar routes (TTS, avatar session minting), session-lifecycle routes (heartbeats, resume parsing), and the remaining legacy AI-brain routes while they migrate out.
- A raw WebSocket proxy — streams the candidate's 16 kHz PCM audio to Deepgram for speech-to-text; the API key stays on the server.
- Socket.io — legacy WebRTC signaling and the in-app (Puppeteer-based) recording flow.
Actual audio/video media flows browser ↔ LiveKit Cloud (WebRTC); VSS mints the access tokens and controls recording egress. Recordings are produced by LiveKit room-composite egress as MP4 into S3, with a server-side finalize step that copies them into their final location. FFmpeg is part of the deployment environment for the recording pipeline.
Legacy AI brain (being retired)
VSS predates bot-backend and still carries an in-process OpenAI "interview brain" — question generation, answer evaluation, and feedback-report routes mounted alongside the media API. That brain is being migrated away under the TI-340 wave plan: new AI-interview capability — question planning, evaluation, blended technical interviews — lands in bot-backend, the AI interview brain in the current workflow. In this repo the rule is bug fixes and migration-shim work only; before touching brain code, check which TI-340 wave owns the surface — it may already be scheduled for deletion.
Key components
| Area | What it owns |
|---|---|
livekit/token.js | LiveKit JWT issuance — security-critical; coordinate before changing grant logic. |
livekit/recording.js, recordingFinalize.js, recordingTimeline.js, egressUtils.js, webHooks.js | The recording pipeline: egress control, finalize-to-S3, Redis timeline, LiveKit webhook handling. |
ai-interviews/services/openaiService.js | The one centralized OpenAI client (TTS + remaining legacy-brain calls). All OpenAI calls go through it; per-surface model overrides resolve here. |
ai-interviews/routes/{question,evaluation,feedback}Routes.js | The legacy AI brain — question generation, answer scoring, feedback reports. Being retired under TI-340; net-new logic for these surfaces lands in bot-backend. |
ai-interviews/routes/ttsRoutes.js | Neural OpenAI TTS → MP3 (8 s timeout). |
ai-interviews/routes/sttStreamRoutes.js | Deepgram Nova-3 STT WebSocket proxy (key stays server-side; 10-minute hard cap per stream). |
ai-interviews/routes/simliRoutes.js, spatiusRoutes.js | Avatar session-token minting; Simli pool/queue management. |
ai-interviews/routes/heartbeatRoutes.js + jobs/zombieSessionCleanup.js | Session liveness heartbeats and abandoned-session cleanup/refund. |
ai-interviews/config/env.js | Env loading and precedence — imported first in index.js. |
ai-interviews/middleware/timeout.js | Tight timeouts on external calls (Simli mint 5 s, TTS 8 s) — deliberate latency budgets. |
Route groups: AI interview routes are mounted under /api/ai-mock-interviews/*, LiveKit routes under /livekit/*, plus /health and a legacy Socket.io/recorder surface. If the LiveKit module fails to load, /livekit/* returns 503 but the rest of the server keeps serving.
Local development
Prerequisites
- Node.js 20 LTS or newer — ESM project (
"type": "module"), noenginespin. - PostgreSQL reachable via the
DB_*(orRDS_*) env vars — required for session/feedback persistence. - Redis — only needed for the LiveKit recording pipeline; the AI interview loop boots without it.
- API keys per the table below. For a minimal "does it start" boot, only
OPENAI_API_KEYis strictly needed — everything else degrades gracefully.
Commands
cp .env.example .env # fill in your own values — NEVER commit .env / .env.local
npm install
npm run dev # nodemon (auto-restart)
npm start # prod-style: node index.js
npm test # vitest run — run once, do not leave watch mode running
There is no build step (npm run build is a no-op). A perf suite exists via npm run test:perf. The server listens on PORT (set it in your .env).
Env file precedence (lowest → highest): .env → .env.{development|production} → .env.local (gitignored dev override), loaded by ai-interviews/config/env.js before any other import.
Environment variables
Names only — use your own values; never commit real secrets.
| Variable | Purpose |
|---|---|
OPENAI_API_KEY | Required for any AI surface — questions, evaluation, feedback, TTS fail without it. |
DB_* / RDS_* | PostgreSQL connection for session/feedback persistence. |
PORT | HTTP listen port, e.g. <PORT>. |
NODE_ENV | development / production. |
FRONTEND_URL | Allowed CORS origin, e.g. https://develop.theinterviews.ai. |
LIVEKIT_API_KEY / LIVEKIT_API_SECRET / LIVEKIT_HOST | LiveKit credentials + host (e.g. <LIVEKIT_WS_URL>) — needed to mint room tokens for live interviews. |
REDIS_HOST | Redis for recording state/timeline; unset disables that pipeline. |
AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME | Recording storage, e.g. <S3_BUCKET>; unset breaks egress finalize. |
DEEPGRAM_API_KEY (+ STT_BACKEND=deepgram) | Streaming STT; unset → the STT stream endpoint returns 503. |
SIMLI_API_KEY + SIMLI_FACE_ID | Simli avatar session minting; unset → 503, frontend falls back to the lite orb. |
SPATIUS_API_KEY (+ SPATIUS_REGION, SPATIUS_APP_ID) | Spatius avatar session minting; unset → 503, frontend falls back to the orb. |
INTERNAL_SERVICE_JWT_SECRET | Auth for the zombie-cleanup quota-refund call to user-management. |
OPENAI_QUESTION_MODEL, OPENAI_FEEDBACK_MODEL, OPENAI_TTS_MODEL, … | Per-surface model overrides resolved in openaiService.js; legacy OPENAI_MODEL is a fallback. |
The full annotated list lives in the repo's .env.example (placeholders only).
Gotchas
- One OpenAI client, ever. Don't instantiate OpenAI clients in route files — everything goes through
ai-interviews/services/openaiService.js. And don't point the legacyOPENAI_MODELfallback at a small/cheap model in production: it collapses question diversity across sessions for the same resume. - Env loads first.
import './ai-interviews/config/env.js'is the first line ofindex.js. Anything readingprocess.envat module-init time depends on that ordering. - Timeouts are latency budgets, not arbitrary. Simli mint 5 s, TTS 8 s (
ai-interviews/middleware/timeout.js). Tune deliberately. livekit/token.jsis security-critical. It mints the JWTs that grant room access — coordinate before touching grant logic.- Secrets stay server-side. The STT WebSocket proxy and avatar-session routes exist precisely so
DEEPGRAM_API_KEY,SIMLI_API_KEY, andSPATIUS_API_KEYnever reach the browser or logs. - Graceful degradation, not crashes. Missing optional integrations return
503and the frontend falls back (lite orb, no STT). If LiveKit fails to load, only/livekit/*goes dark. - Idempotent liveness. Heartbeat-stop and zombie cleanup treat unknown sessions as no-ops; the client keeps beating across WebRTC reconnects, so a transient drop under 60 s never reaps a live session.
- Real-time is prod-touching. A live interview can't be rolled back mid-session. Isolate avatar/streaming layers defensively — an observed failure cascade went Simli timeout → CDN fetch failure → WebGL context loss. One layer's failure must not take down the others.
- Watch the migration waves. Before changing AI "brain" code, check which TI-340 wave owns that surface — it may already be scheduled for deletion in favor of
bot-backend.