Production-grade blueprint for deploying and operating scalable ML inference workloads on Kubernetes and OpenShift, leveraging NVIDIA Triton Inference Server.
Building reliable AI systems that move from experiments to production.
I design, build, and deploy production-grade ML applications and inference systems, and architect scalable MLOps platforms with a focus on security and operational excellence.
Foundation in building structured systems, maintainable codebases, APIs, and delivery workflows.
Experience in experimentation, modelling, feature design, and turning business problems into measurable ML outcomes.
Current focus on deploying, scaling, monitoring, and governing ML systems in production environments.
Selected work
CI/CD design for training, validation, model registry, deployment, and monitoring.
LLM application with retrieval, filtering, observability, and secure logging controls.
Technical articles
System Design for Beginners
The system design fundamentals I wish someone explained to me when I was starting out — no fluff, just the mental models that actually matter.
Read article →Prometheus on OpenShift for Production ML Monitoring
The monitoring setup I couldn't find a good guide for — Prometheus + Thanos on OpenShift for ML inference workloads. Metrics collection, reliability tracking, and observability from scratch.
Read article →MLOPS Architecture
How I approach production ML inference on Kubernetes — model storage decisions, OCI-baked deployments, zero-downtime rollouts, and the Helm chart patterns that make life easier for app teams.
Read article →Model Drift — A Complete Guide
Everything I've learned about model drift after watching production models quietly degrade — detection methods, the stats behind them, and knowing when to retrain vs. when to wait.
Read article →AI Agents — What They Are and How to Use Them
What I learned after a year of building with AI agents — how they actually differ from chatbots, the patterns that work, and where I use them in my own workflow.
Read article →How Big Companies Do Drift Monitoring
I dug through engineering blogs from Uber, Netflix, LinkedIn, Airbnb, Meta, and Google to figure out how they actually handle drift. Six companies, six architectures — all sourced.
Read article →LLM Fundamentals — Deep Dive
My working notes on how LLMs actually work — from attention mechanics and tokenization to RLHF, LoRA, quantisation, and scaling laws. Written while trying to explain these things precisely.
Read article →Unlock my latest resume
Enter your email and complete a quick anti-bot check. This keeps the resume endpoint protected from automated abuse.
How I build production AI systems
My focus is not only on building models, but on designing reliable, scalable, and observable AI systems that operate effectively in real-world environments.