Why AI Performs Worst Where Enterprises Need It Most and What Leaders Can Learn from Code Health Research

AI investment across enterprises is accelerating, but the pressure to demonstrate real business returns is intensifying just as quickly. For many leadership teams, the challenge is no longer whether to adopt automation, but how to prevent large-scale initiatives from stalling before measurable value is realized. Recent peer-reviewed research from CodeScene suggests that the underlying structure and health of enterprise codebases play a significant role in determining whether AI tools deliver reliable outcomes or introduce new operational risk.

"One of the biggest challenges organizations face is not just adopting new technology, but defining what good performance actually looks like," says Adam Tornhill, founder of CodeScene. "More than 90% of AI pilot programs are failing. For executive teams focused on return on investment, this pattern highlights the importance of operational foundations rather than surface-level automation."

Tornhill notes that the gap is becoming more visible as AI is deployed deeper into enterprise software environments. "While these tools can generate code at unprecedented speed, speed alone does not guarantee quality, reliability, or long-term maintainability," he says. "In many organizations, the areas where automation is needed most, large, complex legacy systems, are also the ones carrying the highest structural risk."

According to CodeScene, AI coding assistants increase defect risk by at least 30% when applied to unhealthy code, with the real-world impact expected to be significantly higher in large legacy environments. From Tornhill's perspective, this pattern reflects a fundamental mismatch between how AI is currently deployed and the realities of enterprise software complexity. "This highlights a growing industry problem," Tornhill explains. "AI accelerates code output, but it lacks the context to judge whether code is good, risky, or maintainable. As a result, AI performs worst precisely where enterprises need it most, in complex, legacy codebases."

According to Hamdija Jusufagic, CEO and co-founder of CodeScene, the next step is moving beyond financial dashboards alone and toward deeper technical visibility. He explains that organizations often lack a clear picture of what is happening inside their codebases, even though software performance underpins digital revenue, customer experience, and internal efficiency. "When teams can visualize how software evolves, it becomes much easier to identify which areas are stable and which are creating friction," Jusufagic says. "That level of visibility allows organizations to prioritize effort more strategically, rather than treating every part of the codebase as if it carries the same level of risk or importance."

Jusufagic adds that this pattern mirrors longstanding human development challenges. "Machines get confused by the same patterns as humans," he says. "The evidence is clear: unhealthy code undermines AI-assisted development."

It is at this point that CodeScene's peer-reviewed research enters the conversation. The company's study examined how software structure directly influences AI performance across real-world enterprise codebases. The findings indicate that as code health declines, AI error rates rise sharply, reinforcing that structural quality is not only a human productivity factor but a machine performance constraint as well.

CodeScene is a software intelligence platform designed to help organizations measure and monitor the structural health of large codebases. According to Jusufagic, the platform analyzes software systems to surface risk patterns, identify frequently modified hotspot areas, and quantify maintainability and complexity trends over time. This gives engineering and leadership teams greater visibility into where risk accumulates, where development effort is concentrated, and where AI can be applied more safely.

From Tornhill's perspective, this measurement layer plays a critical role in reshaping AI adoption strategies. "The CodeHealth™ metric plays the key role for successful AI adoption," he says. It provides an objective standard that turns AI from a fast code generator into a quality-aware coding partner.

"The conversation is no longer just about whether to adopt AI, but about how organizations prepare themselves to use it sustainably," says Tornhill. "Code health metrics act as an operational compass. When quality is measured consistently, leaders can make clearer decisions about where AI can be deployed safely and where modernization needs to come first, allowing them to manage risk proactively instead of reacting after problems appear."

The research also highlights the continued importance of human oversight. Jusufagic emphasizes that AI works best when paired with human judgment and clear quality standards. Rather than replacing developers, automation becomes more effective when teams guide it with shared definitions of maintainability and performance. In his view, this partnership model preserves accountability while allowing organizations to benefit from speed and scale.

"For non-technical leaders, this is not just a technology decision, it's a governance and measurement challenge," Jusufagic says. "Without clear benchmarks for software quality, AI investments can deliver disastrous results. With the right standards in place, organizations are better positioned to align automation with long-term business goals and operational resilience."

Jusufagic believes this moment represents a turning point. As AI becomes more embedded in daily workflows, leadership teams have an opportunity to rethink how they evaluate software performance and risk. The emphasis is shifting from isolated pilots to system-wide readiness. In that context, code health is emerging as a strategic asset rather than a technical detail.

Ultimately, the path forward connects back to the original ROI question. Organizations that pair AI adoption with strong measurement practices and structural insight can be better positioned to translate experimentation into sustainable value. "AI will continue to evolve, but the organizations that see lasting value will be the ones that invest in foundations, not shortcuts," Tornhill says. "When leaders treat code quality as a strategic priority and measure it consistently, they can create the conditions for automation to scale responsibly, deliver real returns, and support sustainable growth."

Join the Discussion

Why AI Performs Worst Where Enterprises Need It Most and What Leaders Can Learn from Code Health Research

Apple Confirms MacBook Neo Battery Cycle Limit—Here's How Long It Can Last

7 Must-Try Google Gemini Prompts That Reveal the Full Power of AI Capabilities

Are Refurbished Phones Safe and Worth Buying in 2026? Pros, Cons & Smart Tips

Meta Acquires Moltbook to Power the Meta Superintelligence Labs

iPhone 17 Pro Max vs iPhone 18 Pro Max: Specs, Features, and What's New Compared