Appearance
Epic 38 — Performance, Security & Accessibility Hardening
Decision (user, 2026-06-12): Run three platform-wide review tracks before beta — Performance (fast requests = better UX), Security (backend + data + user-facing safety), and Accessibility (full WCAG 2.1 AA, including visual contrast). Each track is audit-first: an audit ticket produces a written findings list with severities, then remediation tickets fix the findings. Audits cover both repos (castyou-backend, castyou-frontend: apps/app, apps/landing, packages/design-system).
Why: The platform has grown across 30+ epics with feature-level testing but no cross-cutting pass. Known smells already on record: design-system Input labels are not programmatically associated with their inputs ([[E2E Test Gotchas]] works around it by selecting on placeholder — an accessibility bug, not a test quirk); GraphQL resolvers were added incrementally by many epics so authorization and N+1 patterns are inconsistent; the app ships media-heavy pages (portfolio, feed, discover) where payload size and query waterfalls directly hurt UX. A talent marketplace also holds sensitive PII (minors may appear in casting contexts, pet owners, physical characteristics, location, salary expectations) — data exposure is a user-safety issue, not just a technical one.
Tracks & severity model: Findings are logged in docs/audits/ (one markdown file per audit) with severity P0 (exploit / data leak / blocker — fix immediately), P1 (real user impact — fix in this epic), P2 (improvement — backlog). Remediation tickets in this epic cover P0/P1; P2s become follow-up tickets only if cheap.
Scope notes: Epic 37 already covers content moderation (user safety from content); this epic covers safety from data exposure and account compromise. Landing visual polish stays gated behind Epic 15 ([[Landing Polish Gate]]) — but accessibility and security fixes to the landing are in scope here (a11y/security are not "polish"). All UI fixes land in @castyou/design-system first ([[Design System Rule]]); any list touched gets pagination verified ([[Pagination Rule]]).
PERF-AUDIT-001 — Backend & frontend performance audit
- [x] Done (2026-06-15 —
docs/audits/2026-06-performance.md, 17 findings: 2 P0/11 P1/4 P2;src/plugins/queryTiming.tsbehindPERF_TRACE)
Files:
- Create:
docs/audits/2026-06-performance.md— findings with severity + measurement - Create:
castyou-backend/src/plugins/queryTiming.ts— temporary GraphQL plugin logging per-resolver duration + Prisma/Mongo query counts per operation (kept behindPERF_TRACE=true, useful permanently)
Audit checklist:
- N+1 queries: instrument the top operations (feed, jobs list, discover, search, portfolio, applications dashboard, admin tables) and log query-count-per-request; anything that scales with result count is a finding. Prisma relation access inside list field resolvers is the prime suspect.
- Missing indexes:
EXPLAIN ANALYZEthe hot Prisma queries (feed ordering, follow lookups, search ILIKE paths, moderation/report queues, notification unread counts); cross-checkschema.prisma@@indexcoverage against actualwhere/orderByclauses. Same for Mongo (messages by conversation, notifications by user+read). - Over-fetching: GraphQL fields resolved eagerly but rarely requested (e.g. heavy JSON columns, counts); responses returning full entities where the UI needs a projection.
- Payload & media: portfolio/feed image sizes served from R2 (are thumbnails used everywhere a thumbnail fits?), video poster usage, missing CDN cache headers.
- Frontend: Lighthouse runs (mobile profile) on landing home, app feed, jobs, portfolio, profile; bundle analysis (
vite build --mode analyze/ rollup-plugin-visualizer) for oversized chunks (Konva, dnd-kit, ffmpeg-adjacent libs leaking into the main bundle); React Query waterfalls (sequential dependent queries that could parallelize); re-render hotspots on the feed. - Redis utilization: hot reads that recompute per request but could cache (suggestion engine, feature flags already cached — verify TTLs, search facets).
Acceptance criteria:
- Findings doc exists with P0/P1/P2 severities, each with a measurement (query count, ms, KB) — not vibes
- Query-timing plugin merged behind
PERF_TRACEenv flag - P0/P1 findings mapped to BE-PERF-001 / FE-PERF-001 scopes
BE-PERF-001 — Backend remediation: N+1s, indexes, caching
- [x] Done (2026-06-15 — all P0/P1 closed: feed + job + petJob N+1 via per-request DataLoaders, 8 Postgres indexes + 2 migrations, Notification Mongo compound indexes, followSuggestions Redis cache. Deferred to follow-up (audit doc): pg_trgm GIN search index, scorer select-projection)
Files:
- Edit: resolvers flagged by PERF-AUDIT-001 (batch with DataLoader or Prisma
include/inqueries) - Edit:
castyou-backend/prisma/schema.prisma— missing@@indexentries - Run:
pnpm db:migrate(Postgres schema change —prisma generatealone is not enough) - Edit: Mongoose schemas — missing compound indexes
- Create:
castyou-backend/src/lib/dataloaders.ts— per-request loader registry on GraphQL context (users by id, profiles by userId, follow-state by pair, media counts)
Notes:
- DataLoader instances must be per-request (constructed in context factory), never module-level — cross-user cache bleed is a security bug, not just staleness.
- Redis caching only for reads that tolerate staleness (suggestions, public profile aggregates); explicit invalidation or short TTL ≤60s; never cache viewer-specific authorization results.
- Target from audit baseline: hot list operations ≤ N+constant queries; p95 of instrumented operations improved and recorded in the findings doc as before/after.
Acceptance criteria:
- Every P0/P1 backend finding closed with before/after numbers appended to the audit doc
- No resolver issues queries proportional to list size on the instrumented operations
- Migration applies cleanly;
EXPLAINconfirms new indexes are used
FE-PERF-001 — Frontend remediation: code-splitting, media, query tuning
- [x] Done (2026-06-15 — ReactQueryDevtools dev-only/lazy out of prod bundle, Vite manualChunks vendor split with Konva/dnd-kit kept lazy. Deferred: hover-prefetch, responsive R2 thumbnails)
Files:
- Edit:
castyou-frontend/apps/app/src/App.tsx(or router entry) —React.lazyroute-level splitting for heavy/rare routes (reel editor, flier editor/Konva, admin pages) - Edit: image-rendering components in
@castyou/design-system—loading="lazy", explicitwidth/height(CLS), thumbnail-first sources - Edit: React Query hooks flagged by the audit — parallelize independent queries, tune
staleTimeon stable data (profiles, flags), prefetch on hover for job/profile cards - Edit:
castyou-frontend/apps/landing— next/image usage check, font loading (display: swap)
Notes:
- Konva (flier editor) and @dnd-kit (reel editor) must not be in the entry chunk — verify with the bundle analyzer before/after.
- Skeletons from the design system for all above-the-fold loading states ([[Design System Rule]]) — perceived performance counts.
Acceptance criteria:
- Entry chunk shrinks measurably (record before/after KB in the audit doc); editor libs only load on their routes
- Lighthouse performance score on feed/jobs/landing improves vs. audit baseline (record scores)
- No layout shift from images on feed/portfolio (CLS ≈ 0 on those pages)
SEC-AUDIT-001 — Security audit: authorization, data exposure, abuse resistance
- [x] Done (2026-06-15 —
docs/audits/2026-06-security.md, 11 findings: 0 P0/4 P1/7 P2;authzMatrix.test.ts357 pass + 4 todo, completeness guard verified)
Files:
- Create:
docs/audits/2026-06-security.md— findings with severity + reproduction steps - Create:
castyou-backend/src/__tests__/security/authzMatrix.test.ts— table-driven test: every Query/Mutation in the schema × every role (anon, TALENT, PRODUCER, PET_OWNER, AGENCY, ADMIN) asserting the expected allow/deny; schema-introspection-driven so a new resolver without a matrix entry fails the test
Audit checklist:
- Authorization (IDOR): every mutation taking an id — can user A act on user B's job/application/media/draft/message/folder? Field-level: do private fields (email, salary expectations, private media, moderation fields, strikes) leak through any public type path (PublicUser, search results, feed payloads, admin-only fields on shared types)?
- Auth lifecycle: JWT expiry/rotation, suspension/ban invalidation actually blocks every transport (HTTP + WebSocket subscriptions if any), password reset token single-use + expiry, email-change confirmation, impersonation (Epic 24) cannot escalate or persist.
- Input handling: GraphQL depth/complexity limits (a nested follow-graph query must not be a DoS vector), upload validation (presigned R2 URLs scoped by content-type/size/key prefix; can a user overwrite another user's object key?),
socialProfileUrl()open-redirect guard still holds for any new link surface, raw string interpolation anywhere near Prisma$queryRaw/Mongo queries. - Abuse resistance: rate limiting on login (brute force), register/waitlist (spam), messaging + follow + report (harassment/flood), expensive endpoints (search, AI calls — CasTars charging is not a rate limit); account-lockout/back-off behavior.
- Transport & headers: CORS allowlist (not
*with credentials), cookie flags if any, security headers on app + landing nginx/Next configs (CSP at least report-only, X-Content-Type-Options, frame-ancestors), GraphQL introspection + error verbosity in production. - Secrets & dependencies: no secrets in repo/client bundles (R2/OpenAI keys server-side only — grep the built app bundle),
pnpm audit/npm auditon both repos, Docker images run as non-root. - User-facing safety: session visibility ("log out other sessions"), password strength policy, no user enumeration via login/reset/register error differences, external links from user content open with
rel="noopener noreferrer"and are plainhttpsonly.
Acceptance criteria:
- Findings doc with severity + repro steps; P0s flagged to the user immediately, not just filed
- Authz matrix test merged and green (every existing resolver covered; unlisted resolvers fail CI)
- P0/P1 findings mapped to BE-SEC-001 / FE-SEC-001 scopes
BE-SEC-001 — Backend remediation: authz fixes, rate limiting, hardening
- [x] Done (2026-06-15 — all P1 + backend P2 closed: messaging PII redaction, R2 key scoping, GraphQL depth(12)/cost(5000) limits, subscription ban gate, uniform auth errors, password policy, agency-invite consent, pet IDOR guard, logoutEverywhere; authzMatrix todos now real. Deferred (audit doc): SEC-F06 token storage [arch decision], SEC-F08 CSP/headers + SEC-F09 container/CI [-> TEST-PSA-001])
Files:
- Edit: resolvers flagged by SEC-AUDIT-001 — ownership/role guards, field-level redaction
- Create:
castyou-backend/src/middleware/rateLimit.ts— Redis sliding-window limiter (per-user + per-IP), applied to login, register, password reset, sendMessage, follow, reportContent, search, AI endpoints - Edit: GraphQL server config — depth + complexity limits, production error masking, introspection off in prod (admin tooling exception if needed)
- Edit: upload presign service — content-type allowlist, max size, user-scoped key prefixes
- Edit: nginx/Next configs — security headers (CSP report-only first, X-Content-Type-Options, frame-ancestors, referrer-policy)
Notes:
- Rate limiter is fail-open on Redis outage (same stance as notifications/moderation enqueue — availability over strictness) but logs loudly.
- Uniform auth errors: login/reset/register return identical messages + timing for unknown-user vs wrong-password (no enumeration).
- Every fix gets a regression entry in the authz matrix test or a dedicated security test — a security fix without a test will regress.
Acceptance criteria:
- All P0/P1 backend findings closed, each with a test
- Rate limits verified (burst test) and fail-open verified
- Suspension/ban kills live access on all transports
FE-SEC-001 — Frontend remediation: client-side hardening + account safety UI
- [x] Done (2026-06-15 — shared DS ExternalLink rolled out across app+landing, queryClient.clear() on logout, PasswordStrengthMeter on register + change-password, backend error-string sanitization, log-out-everywhere button wired to logoutEverywhere mutation [confirm modal + clearAuth/redirect, i18n en/pt/es])
Files:
- Edit: user-content link rendering (posts, profiles, portfolio links, messages) — enforce
https?:scheme allowlist +rel="noopener noreferrer" target="_blank"in one shared design-systemExternalLinkcomponent - Edit: token storage/refresh handling per audit findings; logout clears all cached query data (
queryClient.clear()) - Edit:
apps/appProfile → Settings — "Active sessions / log out everywhere" + password-strength meter on change/register (DS components) - Edit: error boundaries / toast paths — never render raw backend error strings to users
Notes:
- One
ExternalLinkcomponent in the design system replaces ad-hoc anchors everywhere ([[Design System Rule]]) — the audit grep should find zero rawtarget="_blank"anchors in apps afterward.
Acceptance criteria:
- All P0/P1 frontend findings closed with tests
- Built app bundle contains no secrets (grep verified in CI or build script)
- Logout-everywhere works (other session's next request is rejected)
A11Y-AUDIT-001 — WCAG 2.1 AA accessibility audit (app + landing + design system)
- [x] Done (2026-06-15 —
docs/audits/2026-06-accessibility.md, 20 findings: 6 P0/9 P1/5 P2, 17 failing contrast pairs;design-system/src/__tests__/a11y.test.tsxvitest+jest-axe smoke, green via it.todo)
Files:
- Create:
docs/audits/2026-06-accessibility.md— findings per screen × criterion, with severity - Create:
castyou-frontend/packages/design-system/src/__tests__/a11y.test.tsx— vitest-axe smoke over every exported DS component in default + error + disabled states
Audit checklist:
- Contrast (explicit user requirement): programmatic sweep of all design-system color tokens — text/background pairs ≥ 4.5:1 (normal) / 3:1 (large text + UI components/focus indicators), in both light and dark themes; check disabled-but-readable text, placeholder text, badge/status colors (notification badge, moderation states, application status chips), charts in admin.
- Forms: every DS
Input/Select/Textarealabel programmatically associated (htmlFor/idor wrapping) — the known gap from [[E2E Test Gotchas]]; error messages linked viaaria-describedby; required state announced. - Keyboard: full traversal of critical flows (register → profile edit → browse jobs → apply; producer post job; messaging) with keyboard only; focus trap + restore in every Modal/Drawer/dropdown (NotificationBell, NavDrawer, report modal); visible focus indicator on all interactive elements; no positive tabindex.
- Screen reader: landmarks (nav/main/header), page titles per route, heading hierarchy, alt text on user media (portfolio items have titles — use them), icon-only buttons have accessible names, live regions for async results (toasts, "application submitted", unread counts), DataTable semantics + pagination controls announced.
- Structure & motion: touch targets ≥ 44px on the mobile bottom bar/drawer,
prefers-reduced-motionrespected by DS transitions, zoom to 200% without loss, no information conveyed by color alone (status chips get icons/text). - Tooling: axe automated pass on every route of app + landing (Playwright + @axe-core/playwright in castyou-automated-tests), manual VoiceOver pass on the critical flows.
Acceptance criteria:
- Findings doc per screen with WCAG criterion references and severities
- DS axe smoke test merged (failing cases documented as findings, then fixed in DS-A11Y-001)
- Token contrast sweep results recorded (every failing pair listed with measured ratio)
DS-A11Y-001 — Design system remediation: labels, contrast tokens, focus, ARIA
- [x] Done (2026-06-15 — all P0/P1 closed: Input/Textarea/Select/MultiSelect label assoc + error ARIA, Modal/NotificationBell/NavDrawer focus traps, contrast tokens fixed both themes (placeholder/Chip/Badge/focus-ring/Button variants), status not-color-only, prefers-reduced-motion, content-tertiary token. DS axe smoke suite green. [[E2E gotchas]] memory updated)
Files:
- Edit:
castyou-frontend/packages/design-system/src/components/Input(+ Select, Textarea, Checkbox, Radio, Switch) — generateid(useId), associate label viahtmlFor, wirearia-describedbyfor error/help text,aria-invalidon error - Edit: design-system color tokens — adjust failing pairs from the audit sweep in both themes (keep brand hues, shift lightness)
- Edit:
Modal,Drawer/NavDrawer,NotificationBelldropdown, menus — focus trap,Escclose, focus restore to trigger,aria-modal/rolesemantics - Edit:
Button(icon-only variant requiresaria-label— make it a TS-required prop),Badge/status chips (icon or text alongside color),DataTable(caption/scope,aria-sort),Pagination(current-page announcement),Skeleton(aria-busyon containers) - Edit: global focus-visible ring token meeting 3:1 against all surfaces; transitions honor
prefers-reduced-motion
Notes:
- The label association fix changes how tests select inputs — update affected specs to
getByLabelTextand update the [[E2E Test Gotchas]] memory/Playwright helpers (the placeholder workaround becomes obsolete; that's the point). - Token changes ripple everywhere — visual-check the admin area and both themes; this is a contrast fix, not a redesign.
Acceptance criteria:
- DS axe smoke suite green across all components/states
- All token pairs from the audit pass AA in both themes
- Keyboard-only operation of every interactive DS component verified
FE-A11Y-001 — App & landing remediation: structure, flows, media
- [x] Done (2026-06-15 — landmarks, skip-to-content link, 44px tab targets + aria-current, live regions for unread/async (i18n en/pt/es), alt from media titles, landing semantic main. Deferred: axe Playwright suite [-> TEST-PSA-001], 200% zoom manual pass)
Files:
- Edit:
apps/appshell + routes — landmarks, per-routedocument.title, heading hierarchy, skip-to-content link - Edit: feed/portfolio/jobs/discover screens — alt text from media titles, accessible names on icon actions (follow, report, like, share), live-region announcements for async mutations
- Edit:
apps/landing— same structural pass (semantic sections, image alts, form labels on waitlist/contact) - Edit: castyou-automated-tests — add the axe Playwright pass from the audit as a permanent suite
Notes:
- Landing a11y/security fixes are in scope despite the Epic 15 gate ([[Landing Polish Gate]]) — this is compliance, not polish. No visual redesign of landing here.
Acceptance criteria:
- Axe Playwright suite green on every app + landing route
- Critical flows (register → apply; post job; message) completable with keyboard only and coherent under VoiceOver
- All P0/P1 a11y findings closed
TEST-PSA-001 — CI guards so the wins stick
- [x] Done — RESHAPED 2026-06-28. The original PR-gated, cross-repo design didn't fit reality: we push directly to
dev(CI only fired on PRs/push:main, so it had not run since 2026-05-26), the repos are private on the free plan (branch protection / required checks return403 Upgrade to GitHub Pro, so "required check" is impossible and moot with no PRs), and the two highest-cost gates needed a cross-repo PAT that was never set. Decision (user, 2026-06-28): lean CI = fast single-repo gates only; move real enforcement to a local pre-push hook.- CI kept (both repos,
.github/workflows/ci.yml):typecheck,test(BE unit incl. authz-matrix; FE app + DS incl. axe smoke),audit(pnpm audit --audit-level high, blocking). Trigger nowpush: [main, dev]+pull_request: [main, dev]so direct pushes todevactually run. - Removed from CI (heavy / service-dependent / cross-repo): BE migration-drift (needs a Postgres service), FE bundle-size budget (needs a full prod build), FE graphql-contract (cross-repo schema fetch + PAT), and the entire full-stack axe E2E workflow in
castyou-automated-tests(Postgres/Mongo/Redis + dual-repo bring-up + seed; deleted.github/workflows/ci.yml). The artifacts they ran still exist and are runnable locally on demand:pnpm --filter @castyou/app size:check,pnpm --filter @castyou/app gql:check(setBACKEND_SCHEMA_PATH), andpnpm test:axe(axe spec retained attests/axe-accessibility.spec.ts). - Enforcement = local pre-push hook (
.githooks/pre-pushin BE + FE, wired viacore.hooksPath, auto-set on install by apreparescript). Runstypecheck && test && auditwithCI=true(so vitest runs once); blocks the push on failure; bypass withgit push --no-verify. This is the only thing that can actually block on the free plan. CI_REPO_TOKENis no longer needed (the only two jobs that used it were removed).
- CI kept (both repos,
Notes:
- Tradeoff accepted: migration-drift, bundle-size, gql-contract and full-stack a11y are no longer auto-enforced on push. They remain one local command each; re-promote any of them (e.g. to a pre-beta
workflow_dispatchor a beefier plan's required checks) if/when the repo plan or workflow changes. - If the team later adopts a PR-into-
devflow or upgrades to GitHub Pro/Team, required checks become possible — revisit then.
Acceptance criteria (revised):
- Every push to
devruns typecheck + test + audit in CI (reporting), and the pre-push hook blocks a failing push locally. - A new resolver without an authz-matrix entry, or a high+ dependency advisory, fails both the hook and CI.