Skip to main content

user-management

Purpose

user-management is the source-of-truth backend for theinterviews.ai: it owns authentication and session issuance, customer records, subscription plans, Stripe billing, entitlement enforcement, the AI-routing filter that decides which AI worker serves each evaluation/feedback surface, and the JD-funnel (job-posting → candidate-signup) flow. Every other service — the Next.js app (interviews-ui), the session room (smart-interview-ui), the LiveKit/legacy brain (video-streaming-server), and the Python AI worker (bot-backend) — depends on it for identity, entitlements, and routing decisions.

Architecture

It is a stateless Spring Boot REST API. End-user clients authenticate with a Bearer JWT; internal services (bot-backend, video-streaming-server) call back on /api/internal/* with a shared-secret HS256 service JWT. State lives in PostgreSQL, accessed through JPA with ddl-auto=validate — the app only validates the schema and never mutates it. Redis is optional pub/sub used for config-cache invalidation; the app boots without it. Stripe handles Checkout, Connect payouts, and webhooks; S3 stores resumes, resources, and recordings.

Request and auth flow

  1. Login / JWT issuance — the client logs in (password, magic link, Google OAuth, or OTP) and receives an access token (1 h) plus a refresh token (7 d). Claims carry the customer id, role, and a tokenVersion.
  2. Authenticated request — three load-bearing servlet filters run in order: TraceIdFilterJwtAuthenticationFilter (verifies the JWT, re-checks tokenVersion/status/lock on every request) → AIRoutingFilter on eval/feedback/question-gen/resume-parse paths. Controllers stay thin; services enforce plan limits via SubscriptionMetricsService and return 402/403 when a plan limit is hit.
  3. Internal service callbot-backend posts eval results and video-streaming-server posts session-lifecycle events to /api/internal/*; InternalServiceAuthFilter verifies the HS256 service JWT (issuer/audience/signature) and grants ROLE_INTERNAL_SERVICE.

Token revocation works through tokenVersion: bumping it on the customer row invalidates every outstanding JWT, because the auth filter re-checks the version on each request.

AIRoutingFilter is the migration switch (TI-340): it reads ai.route.* keys from the platform_config table and stamps each eval/feedback request with a routing target — LEGACY (video-streaming-server) vs INTERNAL (bot-backend).

Key components

The root package is com.ti.usermanagement, layered controller (thin) → service (logic) → repository (data) → DTO (never leaks entity shape).

DomainRepresentative classesOwns
Auth & identityAuthController, JwtService, JwtAuthenticationFilter, MagicLinkController, OtpControllerJWT issuance/refresh, tokenVersion revocation, Google OAuth, OTP, magic links
Customer & profileCustomerProfileController, ResumeControllerCustomer records, resume parsing (PDFBox/Tika), profile stats
Plans & subscriptionsPlanService, SubscriptionService, SubscriptionMetricsServicesubscription_plan is the entitlement source of truth; per-plan usage enforcement
BillingStripeController, OrderControllerStripe Checkout, Connect payouts, HMAC-verified webhooks; append-only ledger
Profile CardProfileCardController, ProfileCardShareControllerPublic shareable candidate scorecard, server-side rating breakdown
AI routing & evalAIRoutingFilter, AIRoutingService, EvaluationControllerRouting target per eval/feedback surface from platform_config
Internal service APIInternalServiceAuthFilter, InternalEvalResultController, InternalSessionLifecycleController/api/internal/* — service-JWT-only endpoints for the AI workers
JD-funnelJobPostingController, PublicJobPostingControllerRecruiter job postings; public short-code landing pages driving candidate signup
AdminAdminController, PlatformSettingsController, ConfigController/api/admin/** (ROLE_ADMIN); manages platform_config including encrypted integration keys
Interview & feedbackInterviewSessionController, FeedbackControllerInterview lifecycle, types/levels, feedback capture
Marketing & publicBlogController, ContactInquiryController, SalesContactControllerPublic surfaces; contact endpoints rate-limited (Bucket4j) and CAPTCHA-gated (Turnstile)
Cross-cuttingSecurityConfig, TraceIdFilter, SymmetricEncryptionServiceFilter ordering, stateless sessions, CORS, platform_config secret decryption

Local development

Prerequisites

  • JDK 21 (the Gradle toolchain pins Java 21)
  • PostgreSQL running locally
  • Redis — optional; the app boots without it (only cross-process config-cache invalidation is lost)
  • Docker — required for the test suite (Testcontainers)

Setup

  1. Create the local database named in application-local.yaml (placeholder: <LOCAL_DB_NAME>). Because ddl-auto=validate, the schema must exist before first boot — load a schema dump or apply the migrations first.
  2. Configure secrets: copy .env.example to .env (gitignored) and set INTERNAL_SERVICE_JWT_SECRET to a value of at least 32 characters. Most other local defaults are pre-filled in application-local.yaml.
  3. Apply migrations manually — Flyway is on the classpath but deliberately disabled:
psql "<DB_URL>" -f src/main/resources/db/migration/V{YYYYMMDDHHMMSS}__short_name.sql

Migrations are applied by hand, in timestamp order, against every environment. They are append-only forward: never edit a shipped file, write a new one instead.

Commands

./gradlew bootRun # run the API locally; run once to verify, then stop
./gradlew build # compile + test + layered bootJar
./gradlew test # full suite (Testcontainers — needs Docker)
./gradlew test --tests "ClassName.method" # single test
./gradlew compileJava # compile/type check (no spotless/checkstyle configured)

Once running locally, Swagger UI is served at /swagger-ui.html, the OpenAPI JSON at /v3/api-docs, and health at /actuator/health.

Profiles and environments

Config is layered: application.yaml (shared base) plus application-{local,dev,prod}.yaml overrides, selected by SPRING_PROFILES_ACTIVE (defaults to local). Dev and prod run on AWS Elastic Beanstalk with config injected as environment properties; deploys are automated by CI on push (dev branch → dev environment, trunk → prod). The dev site lives at https://develop.theinterviews.ai; prod is https://www.theinterviews.ai.

Environment variables / configuration

Values are always placeholders — never commit real values.

VariablePurpose
INTERNAL_SERVICE_JWT_SECRETHS256 secret for /api/internal/*; must be ≥32 chars and byte-identical across user-management, bot-backend, and video-streaming-server
DB_URL / DB_USERNAME / DB_PASSWORDBootstrap datasource, e.g. <DB_URL>
ENCRYPTION_SECRET_KEYMaster key that decrypts integration secrets stored in platform_config — bootstrap-only, never moved to the DB
GOOGLE_OAUTH_CLEINT_ID / GOOGLE_OAUTH_CLEINT_SECRET / GOOGLE_OAUTH_REDIRECT_URLGoogle login (the CLEINT typo is intentional — see Gotchas)
EMAIL_HOST / EMAIL_PORT / EMAIL_USERNAME / EMAIL_PASSWORDSMTP for magic links, OTP, notifications
JWT_EXPIRATION / JWT_REFRESH_EXPIRATION / JWT_ISSUER / JWT_AUDIENCEToken lifetimes and claims (the JWT signing secret itself lives in platform_config)
TURNSTILE_ENABLED / TURNSTILE_SECRET_KEYCloudflare Turnstile CAPTCHA on contact/sales forms (disable locally)
REDIS_HOST / REDIS_PORT / REDIS_DBOptional Redis pub/sub
BASE_URL / BASE_VIDEO_URLFrontend and video-room base URLs (profile-specific)
AI_ROUTING_VSS_BASE_URL / AI_ROUTING_BOT_BACKEND_AI_BASE_URLTargets the AIRoutingFilter resolves to
DDL_AUTOHibernate DDL mode — leave as validate except deliberate local experiments

Stripe, LiveKit, AWS, and OAuth integration secrets are not env vars: they are stored encrypted in the platform_config table and decrypted at runtime with ENCRYPTION_SECRET_KEY. Manage them via the in-app Admin Settings UI, not config files.

Gotchas

:::warning The dev database is shared — treat queries as production blast-radius The dev database is a shared, real environment, and its data is production-shaped. There is no casual DDL/DML: wrap destructive SQL in BEGIN; … COMMIT;, prepare a rollback before applying anything, and never UPDATE historical billing or subscription rows — the ledger is append-only forward. :::

  • ddl-auto=validate fails boot on any missing column referenced by an @Entity. Adding an entity field without first applying its migration to the target DB = boot failure. A past prod outage was exactly this: a column present in dev but missing in prod. Verify schema parity across environments before shipping the JAR.
  • Flyway is OFF by contract. Never set spring.flyway.enabled=true or auto-apply migrations; all SQL runs manually via psql in timestamp order, in every environment.
  • INTERNAL_SERVICE_JWT_SECRET must be byte-identical across user-management, bot-backend, and video-streaming-server, or every /api/internal/* call returns 401. The app logs a clear warning if the secret is too short.
  • WebFlux is on the classpath only for WebClient (async eval proxy). Do not return Mono/Flux from a @RestController — Spring will silently switch to the reactive web stack.
  • Integration keys live encrypted in platform_config, not in env vars or YAML. Don't go hunting for a Stripe key in .env; don't move the master ENCRYPTION_SECRET_KEY into the DB either (circular dependency).
  • The GOOGLE_OAUTH_CLEINT_* typo is load-bearing — it matches the deployed environment variable names. Renaming requires updating the deployment config in lockstep; never rename unilaterally.
  • subscription_plan is wide (~80 columns, several dead/premature). Don't add columns casually — apply the migration first and add the field to SubscriptionPlanDto.fromEntity, the single mapper that keeps /api/plans and the nested subscription.plan in sync. Hardcoded plan-name checks (if (plan.equals("pro"))) are an anti-pattern; entitlement checks always go through the subscription services.
  • Stripe webhooks are permitAll() in Spring Security — their real auth is HMAC signature verification in StripeSignatureVerifier. The unsigned-webhook bypass is double-gated to the local profile and prints a loud startup banner.
  • Tests need Docker (Testcontainers: PostgreSQL + LocalStack), and the test task forks at cores/2 — expect slowness on small machines.