Generative AI in Engineering: Dr. Piyush Lakhawat on LLM Automation

Dr. Piyush Lakhawat is a Senior Member of Technical Staff working on large language model infrastructure at a globally leading enterprise software company, with a research and industry background spanning supply chain automation, RAG systems, and agentic AI tooling. He has applied GenAI to hard engineering problems at companies including Western Digital, Salesforce, and TuneIn, and holds deep roots in classical ML from his PhD years. Here, he shares how the field shifted under his feet — and how he turned that into an advantage.

From Classical ML to the GenAI Paradigm Shift

"I noticed a constant shift in the field of Machine Learning (ML) and AI since 2015 toward unified global models," says Dr. Lakhawat. He was building classical ML models during his PhD, when ensemble methods were already gaining ground, and then watched as deep learning architectures swept through, starting to generalize across ML problems. But it was GenAI and LLMs that marked a true rupture. "This technology removed the entire concept of training individual models for isolated problems," he notes.

The implications went deeper than architecture. There was a classical concept of the variance-bias tradeoff in ML — GenAI effectively shattered it. Dr. Lakhawat recognized early on that this would define the next era of the field.

"I started applying it to my research prototypes in supply chain automation," he explains. "I saw immediate breakthroughs in a few hard problems that we were stuck on for a while, and there was no looking back from then on."

That early conviction gave him a head start. While others were still evaluating whether LLMs were production-ready, Dr. Lakhawat was already shipping results and moving problems forward.

Building RAG Systems That Actually Work

LLMs are powerful at answering questions grounded in public knowledge, but enterprise engineering problems require proprietary context — and that gap is where RAG becomes essential. "This approach augments the context given to LLMs beyond just the prompt with cherry-picked knowledge artifacts retrieved based on relevance to the prompt," Dr. Lakhawat explains. Getting RAG right comes down to two things: what you put in the knowledge base and how you retrieve from it.

On the curation side, Dr. Lakhawat is direct: "Knowledge base curation is a data engineering problem at its core, where you understand the data schemas and dictionaries really well and work with stakeholders to bucket them into application areas." Those buckets aren't always cleanly separated — they're typically representative of functional domains that overlap in practice.

On the retrieval side, his framework is equally grounded: it's no different from tuning any traditional search system. You have parameters, and you tune them for the right depth and breadth of results.

What looks like an AI problem is often a data problem in disguise. Teams that skip the curation step and jump straight to retrieval tuning usually find themselves chasing symptoms rather than fixing root causes.

Prompt Engineering Is Not Parameter Tuning

"In my opinion, prompt tuning cannot be approached like traditional parameter tuning for models or predictive systems," Dr. Lakhawat says bluntly. The temptation is to treat prompts like hyperparameters — build an evaluation dataset, tune for performance, iterate. But that approach becomes overfitted to the model at hand and loses explainability entirely.

Instead, Dr. Lakhawat grounds his prompt design in linguistics. "These are language-based models, as the name suggests," he points out. The rules that follow are deceptively simple: be specific and unambiguous, eliminate contradictions, and always provide examples that cover key edge cases.

"These are the general rules, and they evolve further based on the problem at hand," he adds. The goal is to create prompts that generalize across models, not ones that are brittle to a single version.

This distinction matters more as teams scale AI systems across multiple models and deployment environments. A prompt architecture built on linguistic principles survives model upgrades; one built purely on benchmark performance often does not.

Forcing Determinism in Early GenAI Days

"The biggest technical hurdles early on were around forcing determinism and maintaining strict output formatting in places where it is essential," Dr. Lakhawat recalls. Chatbot and exploratory analysis use cases allowed some flexibility, but automated task completion and chained function calling did not. "Sometimes simple tasks like generating a JSON response would fail, especially in the early GenAI days."

The response was layered. Prompt engineering was the first call to action, followed by deterministic guardrails and LLM-as-judge frameworks to catch and correct failures downstream. It required patience and a great deal of defensive architecture.

"Since the early days, the technology has evolved, and those problems are much more manageable now with MCP servers and other agentic tools," he notes — a significant shift that reduced what once was heavy custom scaffolding into more standardized tooling.

The evolution reflects how quickly the ecosystem has matured. Problems that required significant engineering workarounds two years ago are now largely handled by the platform layer.

Automating Data and Process Reporting — with Room to Grow

The efficiency gains from AI automation within Dr. Lakhawat's teams have been concrete and measurable. "There were several spreadsheets and reports that were created for review and alignment across various teams and business processes on a regular cadence," he explains. Many were manually curated, consuming significant time from technical staff who had more important problems to solve.

"We were able to automate about 40% of those reports and sheets using AI-based automation," Dr. Lakhawat states. The measurement was straightforward — count the reports, track which ones are now handled autonomously, and monitor quality against the manual baseline.

No sophisticated instrumentation was required; the impact was clearly reflected in hours saved and improved consistency.

The figure is a floor, not a ceiling. As the knowledge bases mature and more processes become documented, the automation coverage continues to expand.

Change Management: AI Twin First, Trust Second

Deploying AI into existing engineering workflows is not purely a technical challenge. "This is the part that requires strong stakeholder management skills and good communication," Dr. Lakhawat says. His standard approach is to build an AI twin of the existing process first, paired with a robust monitoring pipeline. That parallel-run phase allows teams to compare outputs side by side before any cutover.

The critical enabler is auditability. "Empowering the teams to audit any step of the process on demand was the key," he explains. Engineers who can inspect what the AI did and why are far more willing to trust it — and far faster to catch problems early.

His most effective change management strategies include "modularization, monitoring and alerting pipelines, along with transparent auditing skills."

Trust is not built overnight, and Dr. Lakhawat is clear about that. What matters is teams being willing to adopt AI thoughtfully and onboard it in a structured way — not as a black box dropped into their workflow, but as a system they understand and can interrogate.

Supply Chain Risk at Western Digital: Quantifying Uncertainty

"A business-critical problem that I was working on at Western Digital involved quantifying risk in the supply chain for core components like capacitors, transistors, and IC chips," Dr. Lakhawat explains. The global market for these components is highly volatile — purchase lead times can fluctuate dramatically, and unanticipated delays carry massive economic consequences, including missed orders, expedited shipping fees, and downstream production disruptions.

Traditional AI hit a wall here. "This problem was extremely difficult to solve using traditional AI because the data was messy and did not have a fixed format," he notes. He had identified key signals that were predictive of procurement risk alongside proprietary internal data, but classical models could not handle the unstructured, variable-format inputs.

"But with an LLM-powered GenAI framework, I was able to get a reliable estimate of that risk based on past market history."

The specifics of the design remain a registered trade secret of the company. But regarding outcomes, Dr. Lakhawat is concise: the business impact was significant, resulting in savings of several million USD from expedited shipping fees and enabling delivery of promised orders.

The Next Frontier: Multimodal Models and Human Creativity as the Limit

"The next frontier for Generative AI is multimodal models," Dr. Lakhawat says. LLMs have already extracted enormous value from the text data available on the internet and within enterprises — but in-depth learning from audio, video, and screen recordings of actual work sessions represents the next breakthrough. Real engineering work is visual and contextual, and text alone misses much of it.

His long-term vision is ambitious but grounded. "I envision that state-of-the-art models of the future will be able to reason through and design real-world systems from scratch without explicit human guidance in a scalable and reliable way."

When that happens, the constraint on technological progress shifts entirely. "Human creativity will be the only bar for technological progress," he says — a perspective that is less about AI replacing humans and more about AI finally catching up to what humans have always imagined.

_{Disclaimer: The opinions expressed by Dr. Piyush Lakhawat are his own and do not reflect the views of his employer in any way.}

Join the Discussion