Nitesh Singhal

Engineering Leader - AI/ML Infrastructure & Generative AI

Nitesh Singhal is an Engineering Leader with over 12 years of experience architecting large-scale data and AI/ML infrastructure at Google, Uber, and Microsoft.

An expert in driving technical strategy for mission-critical systems, Nitesh specializes in building complex, high-throughput platforms, from real-time ML serving pipelines to multimodal semantic search engines. He bridges the gap between deep technical implementation and organizational strategy, ensuring systems are designed for both performance and scalability.

Passionate about engineering excellence, Nitesh has led widespread initiatives to standardize coding practices and uplift culture across global engineering organizations. He is also an active contributor to the tech ecosystem as a startup advisor, technology awards judge, and conference speaker.

Speaking

DATA festival Online 2025

Reliability-first Generative AI: Turning Models into Business-Ready Systems

October 21, 2025

Generative models unlock powerful automations, but their tendency to produce convincing yet incorrect outputs blocks adoption in mission-critical settings. This talk presents a practical engineering approach for making generative AI safe, auditable, and reliable in enterprise workflows. Attendees will get a step-by-step playbook for designing systems that reduce erroneous outputs, align model behavior to business needs, and deliver measurable value.

View Event Details →
Summit of Things 2025

Hallucinations to High-Stakes Reliability: Building Trustworthy Generative AI Systems

October 21-23, 2025

Premier virtual event focusing on the latest innovations in IoT, AI, connectivity, and cybersecurity. This session explores practical approaches to building generative AI systems that are safe, reliable, and ready for production deployment in high-stakes environments.

View Event Details →

Articles

Observability for LLMs: Traces, Evals, and Quality SLOs

You cannot fix what you cannot see. Logging tokens is not observability. Learn how to build real observability for LLM systems using traces, evals, and quality SLOs. This guide covers tracing every step, adding quality evaluations, defining SLOs, and closing the loop with automated routing and scaling.

📊 OpenTelemetry traces ✅ Quality evals 🎯 SLO frameworks
Read Article →

How to Cut LLM Inference Costs by 40-60%

Learn four proven strategies to dramatically reduce large language model inference costs without sacrificing quality. From request batching to model quantization, discover how to save $60K+ monthly on GPU infrastructure.

💰 40-60% cost reduction ⚡ 2-4x throughput improvement 🧠 4 optimization strategies
Read Article →