Context
- Supernomics runs an AI workforce product, and a 45-second cold start was a tax on every user and a drag on every enterprise deal. The AI/ML pipeline also needed a security posture solid enough to survive a buyer's diligence. Both had to be fixed without slowing the roadmap.
Approach
- Profiled the pipeline to isolate cold-start cost from steady-state latency.
- Re-platformed from Cloud Run to GKE for warm-pool control and predictable scaling.
- Hardened the AI/ML pipeline with secrets management, network segmentation, and observability via Langfuse.
What we built
- GKE-based serving with warm pools and autoscaling tuned to traffic shape.
- Secured ML pipeline with end-to-end tracing and evaluation.
- Observability and alerting wired to SLOs from day one.
Results
- Cold-start latency fell from 45s to 9s, an 80% reduction users feel on every request.
- A security posture buyers could diligence, aligned to SOC 2.
- Predictable scaling under real traffic, operated with the pager held.
Stack & standards
Google CloudGKECloud RunLangfusegRPCPrometheus
SOC 2Secure SDLCSRE
“Senior engineers who owned the problem end to end, with the latency numbers to prove it.”
Verified review · Clutch 5.0
Related work