Gautam Kumar.
← Back to Portfolio

Satark-AI

Multi-Model Deepfake Detection & Speaker Verification

A production-grade deepfake detection platform built as a Turborepo microservices monorepo. Detects synthetic audio via Wav2Vec2 + MFCC spectral forensics, deepfake images via NVIDIA NIM (Llama 3.2-90B Vision) through a Cloudflare Worker proxy, and verifies speaker identity using ECAPA-TDNN voice biometrics with 192-dim embeddings and cosine similarity matching. Ships as a PWA with real-time microphone monitoring, analytics dashboard, and PDF report exports.

The Problem

Generative AI has made it trivially easy to clone voices, fabricate images, and spread synthetic media. Existing detection tools are siloed — they handle either audio or images, never both — and often require expensive infrastructure or remain offline-only.

The Solution

Architected a scalable Turborepo monorepo with three independent microservices: a React 18 PWA frontend, a Hono (Node.js) API gateway backed by PostgreSQL via Drizzle ORM with Clerk JWT auth, and a Python 3.11 FastAPI AI engine for audio deepfake detection (Wav2Vec2) and speaker biometrics (ECAPA-TDNN). Image deepfake detection runs on a fully serverless Cloudflare Worker proxy that forwards requests to NVIDIA NIM (Llama 3.2-90B Vision), completely independent of the Python engine. A GitHub Actions cron job pings both Render services every 14 minutes to prevent free-tier cold starts.

Key Challenges

  • 1Designing a serverless image detection pipeline via Cloudflare Workers with strict CORS whitelisting, 5MB double-layer size enforcement, and 30s AbortController timeout against NVIDIA NIM — all without touching the Python engine.
  • 2Implementing lazy model loading and thread executor offloading in FastAPI to prevent OOM crashes and keep the async event loop non-blocking on free-tier Render instances.
  • 3Building per-user scoped speaker isolation — users can only verify against their own enrolled voice prints — with 192-dim ECAPA-TDNN embeddings stored in PostgreSQL and cosine similarity computed server-side (threshold: 0.75).

Tech Stack

Frontend

React 18, Vite, TypeScript, Tailwind CSS, Framer Motion, Recharts, Wavesurfer.js, Clerk Auth, PWA (Workbox)

API Gateway

Hono (Node.js), PostgreSQL, Drizzle ORM, Clerk JWT Auth, pg connection pool

AI Engine

Python 3.11, FastAPI, PyTorch, Wav2Vec2, SpeechBrain (ECAPA-TDNN), Librosa

Vision Pipeline

NVIDIA NIM (Llama 3.2-90B Vision Instruct), Cloudflare Workers (serverless proxy)

Infrastructure

Turborepo, Docker Compose, Vercel, Render, GitHub Actions (keep-alive cron)