Nitesh Singhal is an Engineering Leader with over 12 years of experience building mission-critical systems at companies including Google, Uber, and Microsoft.
As an engineering leader, he directs the technical strategy for core AI/ML infrastructure platforms. He specializes in architecting real-time, large-scale distributed systems designed to handle massive data throughput with sub-second latency. His work powers intelligent, security-critical services and enables data-driven decisions for products.
Beyond his technical accomplishments, Nitesh is passionate about scaling engineering culture. He founded and led an engineering excellence program that drove the adoption of best practices and measurably improved code quality across his organization. He is also a speaker, technology awards judge, and startup advisor.
Speaking
Reliability-first Generative AI: Turning Models into Business-Ready Systems
Generative models unlock powerful automations, but their tendency to produce convincing yet incorrect outputs blocks adoption in mission-critical settings. This talk presents a practical engineering approach for making generative AI safe, auditable, and reliable in enterprise workflows. Attendees will get a step-by-step playbook for designing systems that reduce erroneous outputs, align model behavior to business needs, and deliver measurable value.
View Event Details →Hallucinations to High-Stakes Reliability: Building Trustworthy Generative AI Systems
Premier virtual event focusing on the latest innovations in IoT, AI, connectivity, and cybersecurity. This session explores practical approaches to building generative AI systems that are safe, reliable, and ready for production deployment in high-stakes environments.
View Event Details →Articles
Observability for LLMs: Traces, Evals, and Quality SLOs
You cannot fix what you cannot see. Logging tokens is not observability. Learn how to build real observability for LLM systems using traces, evals, and quality SLOs. This guide covers tracing every step, adding quality evaluations, defining SLOs, and closing the loop with automated routing and scaling.
How to Cut LLM Inference Costs by 40-60%
Learn four proven strategies to dramatically reduce large language model inference costs without sacrificing quality. From request batching to model quantization, discover how to save $60K+ monthly on GPU infrastructure.