E2E Suite

The theinterviews-e2e repo is a tiered Playwright (TypeScript) harness that tests the deployed environments — it ships no app code, only specs, page objects, and CI. It is the automated regression layer for every user-facing flow on the platform, including the live AI voice interview itself.

Spec groups (tiers)

Playwright projects are the tiers — each maps to a folder under tests/ and is selectable with --project=<name> (backtick the name; e.g. --project=api).

Tier	Folder	What it proves	Cost
smoke	`tests/smoke/`	Production is up and core pages respond. Read-only, GET-only, no auth. Runs on chromium by default; firefox/webkit are separate projects (`test:smoke:all`).	Free
api	`tests/api/`	REST contracts of two backends: the core platform API (auth, plans, billing) and the video-session backend (avatar availability, capacity, session heartbeats). Request-based — no browser, deterministic. Also asserts the test account's plan invariant (if it goes red, fix the account, not the assertion).	Free
ui	`tests/ui/`	Authenticated page checks (dashboard, profile card). Depends on the `setup` project and reuses its logged-in storage state.	Free
ai-interview	`tests/ai/`	The real candidate journey: setup → interview room → coding/voice answers → completion, against the live AI pipeline (real LLM/STT/avatar spend). 480s timeout per test. Covers the coding and technical interviews plus four more voice modes (Behavioural, Managerial, Salary Negotiation, HR).	Real AI minutes
reliability	`tests/reliability/`	Resilience flows: avatar-saturation fallback, reconnect, mic-permission recovery, heartbeat cadence. Mirrors the AI tier's media setup but mocks the network responses, so it is deterministic and free. Opt-in — tagged `@reliability` `@pending-deploy`, not wired into CI until the backing feature ships.	Free

Supporting pieces:

setup (tests/auth.setup.ts) — logs in once and persists storage state for the ui, ai-interview, and reliability tiers.
Fake microphone (lib/fakeMedia.ts) — the AI tier keeps Chromium's fake camera (passes the consent gate) but overrides getUserMedia audio with a WebAudio stream that injects pre-recorded WAV answers on cue after the AI asks. STT, endpointing, and turn-taking run for real — it's a fake mic, not a fake interview.
Page objects (pages/) — SetupPage, RoomPage. The app under test has no data-testid attributes; selectors are role/text based. A brittle selector is a finding to file against the app, not a reason to add test IDs in the suite.

Coverage matrix

What each spec file actually verifies, derived from the test titles in the repo. Counts are current as of this page's last update — the repo is the source of truth.

smoke — 25 checks

Spec	What it verifies	Notes
`homepage.smoke.spec.ts`	`https://www.theinterviews.ai` responds with a non-error status and renders a body.	Prod, read-only.
`routes.smoke.spec.ts`	20 public routes (marketing pages, all signup variants, auth/login, password reset, contact, legal pages, help docs) serve a `<400` status and render; 3 protected areas redirect unauthenticated visitors to login.	Parameterized — one test per route. Prod, read-only.
`waf-large-body.smoke.spec.ts`	The AI question-streaming endpoint accepts a request body larger than 8 KB from a real browser context.	Regression guard for a past WAF misconfiguration.

api — 11 tests

Spec	What it verifies	Notes
`auth.api.spec.ts`	Login returns a token; the token resolves back to the account via the "me" endpoint.	2 tests.
`plans.api.spec.ts`	The plan catalog returns every plan carrying its entitlement fields.	1 test.
`subscription.api.spec.ts`	The shared test account sits on its expected plan — a deliberate invariant.	Skips on envs with no test-account id configured.
`stripe.api.spec.ts`	The Stripe publishable-key endpoint requires auth and returns a key when authed.	2 tests.
`checkout.api.spec.ts`	Checkout-session creation requires auth; creates a test-mode Stripe checkout session for a paid plan.	API-level only — no browser ever completes a purchase. Skips without a test-account id.
`reliability.api.spec.ts`	Video-session backend contracts: the avatar availability/fallback decision shape, capacity-queue stats invariants, and session heartbeat ack + teardown.	Targets the video-session backend, not the core API; env-gated skip.

ui — 2 tests

Spec	What it verifies	Notes
`dashboard.ui.spec.ts`	The logged-in candidate dashboard renders (welcome heading visible).	Reuses the `setup` login.
`profile-card.ui.spec.ts`	The Profile Card page renders real card content for an entitled user — no paywall, no error state.	Waits for the client-side card fetch to settle, so it asserts content, not a loading flash or page chrome.

ai-interview — 6 live interviews

Spec	What it verifies	Notes
`coding-interview.spec.ts`	Full live coding interview: setup → solve in the in-browser editor → run → submit → completion.	Long-running; real AI spend. Coding answers are typed.
`technical-interview.spec.ts`	Full live technical voice interview: setup → spoken WAV answers → completion.	Long-running; real AI spend; needs the fake mic.
`voice-modes-interview.spec.ts`	The four remaining voice modes the setup screen offers — Behavioural, Managerial, Salary Negotiation, HR — each complete a live voice flow end to end.	4 generated tests; the biggest spend in the suite. Runs only in gated/nightly tiers, never on PR.

reliability — 5 tests (opt-in)

Spec	What it verifies	Notes
`reliability.spec.ts`	Avatar saturation and kill-switch both fall back to the orb in "Lite mode" (2 variants — never a white screen); a transient disconnect shows Reconnecting… then resumes on the same question; a denied microphone shows recovery UI with one-click retry; the client posts session heartbeats on the expected cadence.	Deterministic via network mocking — no AI spend. Tagged `@pending-deploy`; some scenarios `fixme`-skipped until the backing feature ships. Not in CI; run with `npx playwright test --project=reliability`.

Gaps & known limits

Flows with no automated coverage, compared honestly against the product flows in the product overview. Until specs exist, these are the QA team's standing manual-test priorities:

The entire recruiter side. JD-funnel posting creation, inviting/screening candidates, and reviewing results (transcripts, AI summaries, scored feedback, recordings) have no specs. Every automated journey is candidate-side.
Human-to-human video interviews. The recruiter↔candidate live session room is untested.
Billing beyond session creation. The API tier proves a Stripe checkout session can be created, but no browser completes a purchase, and nothing verifies entitlements actually change after payment, plan upgrade/downgrade, or cancellation.
Recording & consent. Recording-policy enforcement (off / optional / mandatory), the consent modals, and recording playback have no specs.
Feedback & results. The post-interview feedback views and the feedback PDF export are untested.
Profile Card generation. The ui spec proves the page renders content for an entitled user, but no automated run drives an interview to full completion and then asserts the resulting card scores — verifying card generation still requires a manually completed interview.
Account creation. Signup and reset-password pages are smoke-checked as GETs only; no spec submits a registration or completes a password reset.
Resume optimization and the other adjacent tools — none.
Admin and professional-interviewer surfaces — none (smoke only proves /admin redirects unauthenticated users).

Running locally

npm ci
npx playwright install        # browsers; chromium is enough for the AI tier locally
cp .env.example .env          # then fill in the test-account credentials (ask a teammate)

Then per tier:

npm run test:smoke            # prod smoke, chromium only, headless
npm run test:smoke:all        # prod smoke, chromium + firefox + webkit
npm run test:api              # REST contract checks (no browser)
npm run test:ui               # authed page checks, reuses logged-in state
npm run test:ai               # full live AI suite — all 6 interviews incl. the 4 voice modes
npm run test:ai:coding        # coding interview only
npm run test:ai:technical     # technical voice interview only
npm run report                # open the last HTML report

The reliability tier has no npm script while it is opt-in — run it explicitly:

npx playwright test --project=reliability

Quality gates (these must pass in CI too):

npm run lint
npm run format:check
npm run typecheck

Selecting the target environment

Resolution order, from config/env.ts:

PLAYWRIGHT_BASE_URL — explicit override, always wins.
PLAYWRIGHT_ENV — one of local | dev | prod (dev → https://develop.theinterviews.ai, prod → the production site, local → a local dev server).
Default: dev.

The named environments bake in base URLs for the app on all three envs and for the backends on local/dev only. To run the api tier (or the video-backend contract specs) against prod, you must set the backend base URLs explicitly via PLAYWRIGHT_API_URL (and PLAYWRIGHT_VIDEO_SERVER_URL for the video-session contracts) — there are deliberately no prod defaults.

Example — run the technical voice spec against dev with the question round capped:

PLAYWRIGHT_ENV=dev PW_MAX_QUESTIONS=3 npm run test:ai:technical

PW_MAX_QUESTIONS caps the voice round (default 3) to bound AI cost — keep it low; never run full-length interviews in CI. PW_HEADED forces headed/headless.

How E2E gates releases

Every user-facing story ships with a Playwright spec in this suite, runnable against the dev build — green automation is the default proof of done, ahead of manual QA.
CI is tiered to keep PRs fast and free: pull requests run quality gates plus smoke and api; merges to the main branch add the cross-browser smoke and ui tiers; the full suite including live-AI runs against dev nightly and on demand.
Production promotion happens once per epic, after the epic is fully built and tested on dev. The production live-AI run is manual-trigger only — it spends real money and writes real data, so it is never scheduled and never run "just to check something."

Known limits

The AI tier is long-running and costs real money. ~480s timeout per test, several minutes per interview — and test:ai now runs six interviews. Never run it casually, in watch mode, or in Playwright's --ui mode. Default local development loop is smoke/api/ui — all free.
Nondeterminism is structural in voice tests. The AI's questions vary run to run; the specs assert through a recognized helper (runVoiceInterview()) rather than fixed timelines, and conditional logic is allowed in this tier by design.
Some flows still need a human. The critical, irreversible paths — live interview entry, billing, auth — keep a human verification step per the QA process, and everything in Gaps & known limits above is manual-only today.
Shared test account. The api tier intentionally asserts the test account's plan; resets to that account turn the tier red on purpose. Treat the account as shared infrastructure on the shared dev database.
WAV answer fixtures regenerate on macOS only; elsewhere the committed fixtures are used as-is.

Spec groups (tiers)​

Coverage matrix​

smoke — 25 checks​

api — 11 tests​

ui — 2 tests​

ai-interview — 6 live interviews​

reliability — 5 tests (opt-in)​

Gaps & known limits​

Running locally​

Selecting the target environment​

How E2E gates releases​

Known limits​