Satark-AI
Multi-Model Deepfake Detection & Speaker Verification
A production-grade deepfake detection platform built as a Turborepo microservices monorepo. Detects synthetic audio via Wav2Vec2 + MFCC spectral forensics, deepfake images via NVIDIA NIM (Llama 3.2-90B Vision) through a Cloudflare Worker proxy, and verifies speaker identity using ECAPA-TDNN voice biometrics with 192-dim embeddings and cosine similarity matching. Ships as a PWA with real-time microphone monitoring, analytics dashboard, and PDF report exports.
The Problem
Generative AI has made it trivially easy to clone voices, fabricate images, and spread synthetic media. Existing detection tools are siloed — they handle either audio or images, never both — and often require expensive infrastructure or remain offline-only.
The Solution
Architected a scalable Turborepo monorepo with three independent microservices: a React 18 PWA frontend, a Hono (Node.js) API gateway backed by PostgreSQL via Drizzle ORM with Clerk JWT auth, and a Python 3.11 FastAPI AI engine for audio deepfake detection (Wav2Vec2) and speaker biometrics (ECAPA-TDNN). Image deepfake detection runs on a fully serverless Cloudflare Worker proxy that forwards requests to NVIDIA NIM (Llama 3.2-90B Vision), completely independent of the Python engine. A GitHub Actions cron job pings both Render services every 14 minutes to prevent free-tier cold starts.
Key Challenges
- 1Designing a serverless image detection pipeline via Cloudflare Workers with strict CORS whitelisting, 5MB double-layer size enforcement, and 30s AbortController timeout against NVIDIA NIM — all without touching the Python engine.
- 2Implementing lazy model loading and thread executor offloading in FastAPI to prevent OOM crashes and keep the async event loop non-blocking on free-tier Render instances.
- 3Building per-user scoped speaker isolation — users can only verify against their own enrolled voice prints — with 192-dim ECAPA-TDNN embeddings stored in PostgreSQL and cosine similarity computed server-side (threshold: 0.75).
Tech Stack
Frontend
React 18, Vite, TypeScript, Tailwind CSS, Framer Motion, Recharts, Wavesurfer.js, Clerk Auth, PWA (Workbox)
API Gateway
Hono (Node.js), PostgreSQL, Drizzle ORM, Clerk JWT Auth, pg connection pool
AI Engine
Python 3.11, FastAPI, PyTorch, Wav2Vec2, SpeechBrain (ECAPA-TDNN), Librosa
Vision Pipeline
NVIDIA NIM (Llama 3.2-90B Vision Instruct), Cloudflare Workers (serverless proxy)
Infrastructure
Turborepo, Docker Compose, Vercel, Render, GitHub Actions (keep-alive cron)