Nitesh Singhal

Engineering Leader - AI/ML Infrastructure & Generative AI

Nitesh Singhal is an Engineering Leader with over 12 years of experience building mission-critical systems at companies including Google, Uber, and Microsoft.

As an engineering leader, he directs the technical strategy for core AI/ML infrastructure platforms. He specializes in architecting real-time, large-scale distributed systems designed to handle massive data throughput with sub-second latency. His work powers intelligent, security-critical services and enables data-driven decisions for products.

Beyond his technical accomplishments, Nitesh is passionate about scaling engineering culture. He founded and led an engineering excellence program that drove the adoption of best practices and measurably improved code quality across his organization. He is also a speaker, technology awards judge, and startup advisor.

Speaking

DATA festival Online 2025

Reliability-first Generative AI: Turning Models into Business-Ready Systems

October 21, 2025

Generative models unlock powerful automations, but their tendency to produce convincing yet incorrect outputs blocks adoption in mission-critical settings. This talk presents a practical engineering approach for making generative AI safe, auditable, and reliable in enterprise workflows. Attendees will get a step-by-step playbook for designing systems that reduce erroneous outputs, align model behavior to business needs, and deliver measurable value.

View Event Details →
Summit of Things 2025

Hallucinations to High-Stakes Reliability: Building Trustworthy Generative AI Systems

October 21-23, 2025

Premier virtual event focusing on the latest innovations in IoT, AI, connectivity, and cybersecurity. This session explores practical approaches to building generative AI systems that are safe, reliable, and ready for production deployment in high-stakes environments.

View Event Details →

Articles

Observability for LLMs: Traces, Evals, and Quality SLOs

You cannot fix what you cannot see. Logging tokens is not observability. Learn how to build real observability for LLM systems using traces, evals, and quality SLOs. This guide covers tracing every step, adding quality evaluations, defining SLOs, and closing the loop with automated routing and scaling.

📊 OpenTelemetry traces ✅ Quality evals 🎯 SLO frameworks
Read Article →

How to Cut LLM Inference Costs by 40-60%

Learn four proven strategies to dramatically reduce large language model inference costs without sacrificing quality. From request batching to model quantization, discover how to save $60K+ monthly on GPU infrastructure.

💰 40-60% cost reduction ⚡ 2-4x throughput improvement 🧠 4 optimization strategies
Read Article →