AI Predicts Startup Success Better Than Expert Panels: Adding Humans Makes It Worse

Gemini 2.5 Pro outranked human experts; hybrid human-AI teams scored below the AI working alone.

In this photo illustration, a Perplexity finance website is seen
In this photo illustration, a Perplexity finance website is seen on an iPhone on March 10, 2026 in Miami, Florida. Joe Raedle/Getty Images

Artificial intelligence can outperform trained human panels at predicting which technology startups will succeed — and combining human judgment with a high-performing AI model makes the forecast less accurate, not more. That is the central finding of a working paper by researchers at the University of Michigan, New York University, and Indiana University, whose results were announced publicly on May 27, 2026.

The study challenges one of the most widely held assumptions in enterprise AI design: that a human in the loop improves outcomes. For the specific task of early-stage venture evaluation, the evidence points in the opposite direction.

Gemini 2.5 Pro Correctly Ranked Nearly Four in Five Startup Pairs

The researchers constructed a fully prospective forecasting tournament — meaning all predictions were made before outcomes were known — using 30 live technology ventures launched on Kickstarter after the training cutoffs of every model involved. That design element is significant: it eliminates the possibility that models were retrieving memorized results rather than genuinely forecasting.

A suite of frontier large language models (LLMs) completed 870 pairwise comparisons: given two competing ventures, which would raise more funding? Their predictions were then benchmarked against forecasts from 346 experienced managers recruited through the research platform Prolific, plus three MBA-trained investors working under monitored conditions.

The performance gap was substantial. Human evaluators produced rank correlations with actual outcomes ranging from 0.04 to 0.45 — with the best human forecasters correctly identifying the winner in roughly three out of five comparisons. The top-performing model, Google's Gemini 2.5 Pro, achieved a rank correlation of 0.74, correctly ordering nearly four out of every five venture pairs. Several other frontier models also cleared the 0.60 threshold, putting the AI-human performance gap at a level the researchers describe as striking.

What Is the Augmentation Trap in AI Forecasting?

The study's second major finding may be the more disruptive one for practitioners. When the researchers combined human and AI predictions — effectively creating the "human-in-the-loop" hybrid teams that have become the default architecture across industries deploying AI for high-stakes decisions — overall accuracy dropped compared to the AI operating independently.

Lead author Felipe Csaszar, the Alexander M. Nick Professor and chair of the Strategy Area at Michigan's Ross School of Business, called the effect the "Augmentation Trap." The logic of ensemble forecasting — the "wisdom of crowds" concept that pooling diverse predictors produces better outcomes than any single predictor — failed when one of the predictors was a high-performing AI.

"In this case, the wisdom-of-the-crowd logic doesn't produce an improvement in accuracy," Csaszar said. "If you include a human in the mix, performance decreases."

The reason, according to the working paper, is structural: humans introduce idiosyncratic noise and inconsistency that degrades the signal the AI has already extracted. The AI's advantage comes from exactly the properties humans lack — vast computational capacity, access to cross-domain training data spanning millions of cases, and internal consistency across repeated evaluations.

"Strategy felt so different from algorithmic trading," Csaszar said. "It was, in a sense, obvious that algorithmic trading was doable, because it was all about numbers. But strategy is all about words." The finding suggests that LLMs' facility with language-based reasoning is now sufficient to outperform human experts on judgment tasks that the field had long considered irreducibly human.

How AI Venture Capital Forecasting Closes the Cognition Gap

The AI's edge stems from what Csaszar describes as "unbounding rationality" — the relaxation of cognitive limits that constrain human experts. Three mechanisms drive the superiority, per the paper: the computational capacity to integrate thousands of weak, interacting signals simultaneously; information scale from training on massive, cross-domain corpora; and internal consistency, which eliminates the noise inherent in human judgment.

Human experts are constrained by time, memory, and inconsistency — they may evaluate two similar ventures differently on different days, weight factors differently depending on fatigue, or anchor on irrelevant surface features of a pitch. The AI has none of those limitations.

The implications extend beyond early-stage investing. Csaszar drew a structural analogy: just as the Industrial Revolution lowered the cost of physical labor and the internet lowered the cost of information distribution, AI could reduce the cost of high-level strategic cognition itself. "Cognition is everywhere, so this will have effects everywhere," he said.

For venture capital specifically, the competitive moats that top-tier firms have built on proprietary deal flow, partner networks, and decades of pattern recognition now face a direct empirical challenge. If a frontier LLM outperforms trained investor panels on pairwise comparisons of early-stage ventures, the question of what those panels are uniquely positioned to do becomes harder to answer.

Does AI Replace Venture Capital Analysts?

The study's authors are careful to bound their claims. The tournament measured a specific, well-defined task — pairwise ranking of fundraising potential for early-stage Kickstarter ventures — and benchmarked it against MBA-trained managers and investors rather than the most seasoned general partners at top-tier funds. Whether the findings hold for later-stage investments, for ventures with more complex competitive dynamics, or for the qualitative judgment calls that define boardroom decisions after a check is written, remains untested.

The models were also evaluated on standardized, preprocessed summaries of each venture rather than the full, messy information environment managers navigate in practice. And Kickstarter fundraising success, while a meaningful signal of early-stage market validation, does not fully capture long-run venture performance.

Separately, research published in February 2026 by a German team in Strategy Science found that LLM use in strategic decision-making tasks under time pressure increases information overload and reduces the psychological ownership that typically drives follow-through on decisions. That finding addresses LLMs as tools used by humans, not as independent forecasters, but it suggests the performance advantage documented in the Michigan tournament may not translate to all deployment contexts.

The San Francisco Federal Reserve, in a February 2026 survey of venture capital practitioners, found that investors broadly believe relationship-building and qualitative assessment of founders remain activities where human judgment is not substitutable. The tournament's design — standardized summaries, pairwise comparisons, no ongoing founder relationship — operates outside that zone.

What the study establishes is a first, rigorously controlled benchmark: in early-stage screening under genuine uncertainty, frontier LLMs outperform trained human panels on the prediction task itself. As AI models continue to improve and as more structured settings for testing strategic forecasting capability emerge, the boundaries of that advantage will become clearer.


Frequently Asked Questions

Can AI predict startup success better than venture capitalists?

In a controlled tournament using live Kickstarter ventures, frontier large language models significantly outperformed MBA-trained managers and investors at predicting which startups would raise more funding. The best-performing model, Gemini 2.5 Pro, correctly ranked nearly four in five venture pairs, compared to roughly three in five for the strongest human forecasters. The study applies most directly to early-stage screening tasks; post-investment operations and founder relationships are separate capabilities not addressed by the research.

What is the augmentation trap in AI forecasting?

The augmentation trap is a phenomenon documented in the University of Michigan study in which combining human and AI predictions reduced overall forecasting accuracy compared to the AI operating alone. The effect occurs because human judgment introduces idiosyncratic noise and inconsistency that degrades the AI's signal. The finding challenges the assumption that human-in-the-loop AI systems always produce better outcomes than standalone AI.

How accurate is Gemini 2.5 Pro at predicting venture outcomes?

In the Michigan tournament, Gemini 2.5 Pro achieved a rank correlation of 0.74 with actual fundraising outcomes — correctly ordering nearly four of every five venture pairs. That compares to a top human correlation of 0.45. Several other frontier models also cleared a 0.60 correlation threshold, suggesting the performance gap is not unique to a single model.

Does AI replace venture capital analysts?

The study addresses early-stage screening — ranking opportunities based on standardized information under genuine uncertainty — and finds AI superior at that specific task. It does not address relationship-building, post-investment operational support, boardroom negotiation, or the qualitative judgment calls that follow after a check is written. A 2026 San Francisco Federal Reserve survey of VC practitioners found that investors continue to view human judgment as irreplaceable in relationship-intensive aspects of the job.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Join the Discussion