Chinese AI Models Lead OpenRouter Traffic: Coding Gains Come With China Data Risk

Kimi K2.6 and MiniMax top open-weight AI model benchmarks but trail GPT-5.5 and Claude Opus 4.7 on reasoning.

AI
taken in Helsinki on June 12, 2023, shows an AI (Artificial Intelligence) logo blended with four fake Twitter accounts bearing profile pictures apparently generated by Artificial Intelligence software. OLIVIER MORIN/Getty Images

Three days after OpenRouter raised a $113 million Series B and disclosed that weekly traffic on its platform had reached 25 trillion tokens — five times higher than six months earlier — the most important question for developers choosing between its 400-plus listed models is not which model runs fastest. It is which model brings a legal obligation to share data with a foreign government. For developers routing agentic coding workloads through Chinese-developed open-weight models, that question now has a concrete answer, and it has nothing to do with price.

Chinese-developed open-weight large language models held roughly 61 percent of token volume among OpenRouter's top 10 models in the week of February 24, 2026, and maintained a majority share through April, when combined Chinese-provider traffic accounted for approximately 51 percent of all platform tokens. The shift happened inside eighteen months: Chinese models went from under 2 percent of OpenRouter traffic in late 2024 to consistent majority leadership in 2026.

OpenRouter's May 26, 2026 fundraise announcement, led by Alphabet's independent growth fund CapitalG with participation from NVIDIA's NVentures and enterprise investors including ServiceNow, MongoDB, Snowflake, and Databricks, described the platform as serving over 8 million global users. CEO Alex Atallah called multi-model routing a permanent infrastructure requirement: "The era of picking a single model is over." Whether that era is over on favorable terms for any given developer depends substantially on the legal jurisdiction of the model provider they route to.

What OpenRouter Token Usage Data Actually Shows

The headline usage figures require precise reading before they drive any deployment decision. The 61 percent figure traces to a single week in February 2026 and measures share among the top 10 most-used models — not across all 400-plus models on the platform. A longer-horizon study of 100 trillion tokens published by OpenRouter and Andreessen Horowitz in late 2025 put Chinese open-weight model share at roughly 30 percent of total weekly volume by mid-2025, with proprietary Western models still holding approximately 70 percent of total global API share. The February peak reflects a concentration of Chinese models in the highest-volume coding and agentic workload categories, not a sudden majority of all global API calls.

OpenRouter COO Chris Clark confirmed the structural reason: Chinese open-weight models have captured developer share because they are, in his description, "disproportionately heavy in agentic flows run by U.S. developers." Programming workloads grew from roughly 11 percent of total OpenRouter token volume to more than 50 percent through 2025, and agentic workflows now account for more than half of all output tokens on the platform. The unit economics of agent tasks are structurally different from those of chat: a single overnight coding run can invoke a model thousands of times, making per-million-token pricing the dominant operating cost.

A second qualification: analysis of April 2026 OpenRouter rankings found that usage and capability scores had decoupled. Developers were optimizing for blended cost per token and specific capabilities, particularly for long-context coding, rather than raw benchmark leadership.

DeepSeek, Kimi K2.6, MiniMax: What Benchmarks Confirm and What They Do Not

On the capability side, the gap between Chinese open-weight models and Western proprietary models narrowed substantially in 2026's second quarter. Moonshot AI's Kimi K2.6, released April 20, 2026, and Xiaomi's MiMo-V2.5-Pro both scored 54 on Artificial Analysis's Intelligence Index — the highest scores of any open-weight models — placing them 3 to 6 points below the leading proprietary models: GPT-5.5 at 60, and Claude Opus 4.7 and Gemini 3.1 Pro Preview both at 57. For context, the highest-scoring open-weight model a year earlier was DeepSeek V3, at 22 on the same index. DeepSeek V4 Pro, released April 24, 2026, scored 52, placing it second among open-weight reasoning models behind Kimi K2.6.

On SWE-Bench Pro, the harder, less-saturated successor to SWE-Bench Verified that measures real GitHub issue resolution, Kimi K2.6 scored 58.6 percent — ahead of GPT-5.5 at 57.7 percent — becoming the first open-weight model to surpass a leading proprietary model on that specific benchmark.

Those figures come from Moonshot AI's own benchmarking. As of early May 2026, independent third-party verification of Kimi K2.6's benchmark claims had not been published. The Stanford HAI 2026 AI Index found that the US-China AI model performance gap stood at 2.7 percent as of March 2026, while noting that invalid question rates on major benchmarks range from 2 percent to 42 percent, complicating direct comparisons. Kili Technology, which provides expert evaluation services for production AI systems, found that enterprise agentic AI systems show a 37 percent gap between lab benchmark scores and real-world deployment performance.

Independent developer testing published in May 2026 found meaningful variation within the Chinese open-weight group. Kimi K2.6 and DeepSeek V4 Pro both reached the highest usability tier on a real-world Ruby on Rails coding benchmark — DeepSeek V4 Pro requiring a custom Claude Code adapter called DeepClaude. MiniMax M2.7, by contrast, generated API call signatures that failed on first execution. Five other Chinese models tested required one to two hours of additional patching to reach production usability.

Where Chinese open-weight models still trail Western proprietary models is specific and measurable. On the Artificial Analysis hallucination benchmark AA-Omniscience, Kimi K2.6 posted a 39 percent rate — close to Claude Opus 4.7's 36 percent but still higher. DeepSeek V4 Pro's hallucination rate was 94 percent, meaning that when it does not know the answer, it almost always responds anyway. On multimodal tasks, Kimi K2.6 ranked 26th out of 115 models. On hard reasoning benchmarks — GPQA Diamond, Humanity's Last Exam, frontier mathematics — closed-source models retained a 3- to 8-point lead. Context windows also differ: Kimi K2.6 supports 262,000 tokens versus DeepSeek V4's 1 million token context, a structural advantage for large-codebase workloads.

How Does Open-Weight AI Pricing Compare to Claude and GPT-5?

Three structural forces explain the direction of the market shift. First, price: MiniMax M2.5 charged roughly $0.30 per million input tokens and $1.20 per million output tokens; Claude Opus 4.6 charged approximately $5 per million input and $25 per million output. For agentic workloads that invoke a model thousands of times per session, a cost gap of that magnitude is not a footnote — it is a budget line that compounds with scale. MiniMax M2.5 scored 80.2 percent on SWE-Bench Verified, against Claude Opus equivalent performance of approximately 80.8 percent, a gap narrow enough that the price difference was difficult for cost-sensitive developers to justify.

Kimi K2.6, the highest-scoring open-weight model, priced its API at approximately $0.73 per million input tokens and $3.49 per million output on OpenRouter — still cheaper than Western frontier models but at a significantly higher price point than the MiniMax tier. Not every Chinese model pairs its highest capability with its lowest price.

Second, architecture alignment: MiniMax M2.5 was positioned at launch as designed for agent scenarios, and Kimi K2.6's architecture can coordinate up to 300 parallel sub-agents for autonomous task decomposition across sessions lasting more than twelve hours. The design choices matched the direction of demand precisely as the demand shift arrived.

Third, business model: DeepSeek is backed by the High-Flyer hedge fund and has stated it does not need API revenue to sustain its operations. Alibaba's Qwen and Zhipu's GLM operate inside cloud businesses that can absorb loss-leader API pricing for ecosystem reasons. Some Chinese providers can sustain lower prices because their models serve strategic goals beyond direct monetization — but the long-term unit economics of those structures are not verifiable from public data.

What Still Holds for Enterprise and Regulated Workloads

Large enterprise customers have largely not replicated the OpenRouter developer pattern. Most enterprises contract directly with Anthropic, OpenAI, or through Azure and Google Cloud hosting, where Chinese model penetration remains far lower than OpenRouter's share data suggests. The broader 100-trillion-token study placed proprietary Western models at roughly 70 percent of total global API share. The reversal is sharpest in developer experimentation, less so in enterprise core workloads.

The workloads where Western proprietary models retain their strongest position are those where error cost is high: regulated financial services, healthcare, government procurement, and customer-facing applications where a wrong answer carries direct economic or reputational consequence. Open-weight models can be re-tuned and un-aligned by any downstream operator, removing the safety guarantees that managed services bundle into their pricing.

Chinese AI Security Risk: What China's National Intelligence Law Requires

The most important legal fact about every Chinese AI model covered in this article is not a product feature and is not captured in any benchmark. China's National Intelligence Law, enacted in 2017, requires all Chinese companies — without exception — to "support, assist, and cooperate" with the Chinese government's national security investigations and intelligence collection activities. This obligation applies to Moonshot AI (Kimi), MiniMax, DeepSeek, Zhipu AI (GLM), Alibaba (Qwen), and Xiaomi (MiMo) regardless of where their model weights are hosted, regardless of whether the company has incorporated a Western subsidiary, and regardless of the company's stated privacy policy.

This is a fixed structural condition, not a speculative risk to be balanced against price advantages. The law does not provide an opt-out and does not require a demonstrated government request before it creates exposure — the obligation exists continuously and applies to any data the company has access to.

The House Committee on Homeland Security and the House Select Committee on China announced a joint investigation on April 29, 2026 into national security and cybersecurity risks posed by Chinese AI models. The investigation named Moonshot AI, MiniMax, Alibaba, and DeepSeek specifically, and separately targeted Anysphere — the maker of the Cursor AI coding tool — and Airbnb for building on Chinese AI infrastructure. Congressional investigators noted that Cursor's Composer 2 backend reportedly used a model built on Moonshot AI's open-weight architecture.

Zhipu AI, the developer of the GLM series, has been on the U.S. Commerce Department Entity List since January 2025, added for its role in advancing Chinese military modernization through AI development. Anthropic publicly alleged in February 2026 that DeepSeek, Moonshot AI, and MiniMax had conducted industrial-scale distillation campaigns against its Claude models — creating over 24,000 fraudulent accounts and generating more than 16 million exchanges in violation of Anthropic's terms of service and regional access restrictions. OpenAI made similar allegations to Congress. None of the three companies had issued a formal public denial of the distillation allegations at the time of publication.

None of those regulatory findings constitute confirmed evidence of a government backdoor in any specific API product. The structural risk is legal, not currently evidenced by a specific confirmed incident — but the law making it a structural risk is not in dispute, and companies subject to it have no legal pathway to refuse compliance when a government request arrives.

Decision Framework for Developers Evaluating Chinese Open-Weight Models

For developers building agentic coding workflows on a per-token budget, the choice set has reorganized around five Chinese labs — Moonshot AI, MiniMax, DeepSeek, Zhipu AI, and Xiaomi — in eighteen months. For those workflows, the open-weight option is now price-competitive and, on specific coding benchmarks, capability-equivalent.

Four considerations should govern any production deployment decision, independent of benchmark scores.

First, what data flows through the model matters more than what it outputs. Any prompt routed through a Chinese-provider API endpoint — including those accessible through OpenRouter — is processed under the jurisdiction of China's National Intelligence Law. If the prompt contains proprietary code, customer data, internal documents, or any information that would be sensitive under a data-sharing scenario with a foreign government intelligence agency, the legal exposure is not speculative.

Second, benchmark scores from a model's own technical report are not independent verification. As of early May 2026, no independent named third party had published a full audit confirming Kimi K2.6's headline SWE-Bench Pro result. The Stanford HAI AI Index and Kili Technology's production-deployment research both document a systematic gap between self-reported and third-party-verified performance.

Third, ecosystem and operational maturity vary significantly across the Chinese open-weight group. Kimi K2.6's context window of 262,000 tokens is substantially smaller than DeepSeek V4's 1 million token context — a limitation that matters directly for large-codebase work. MiniMax M2.7 generated non-functional API calls in at least one independent production test. Documentation, support structures, and tooling for international users remain less mature than those provided by Anthropic, OpenAI, and Google.

Fourth, the price advantage is not uniform across the Chinese lineup. MiniMax M2.5 and DeepSeek V4 Flash price inputs below $0.30 per million tokens. Kimi K2.6, the highest-scoring model, prices inputs at approximately $0.73 per million on OpenRouter. The cost argument is strongest for the lower capability tier; the cost-and-capability argument is narrower for the frontier Chinese models than the aggregate framing suggests.

A developer routing general agentic coding workloads through MiniMax or DeepSeek at a fraction of Claude Opus pricing is making an economically defensible choice on cost-per-task alone. The same developer placing customer financial data, proprietary source code, or sensitive organizational information into those prompts is making a legal exposure decision, not just a cost decision — and that exposure is a function of Chinese law, not of how secure the model's API endpoint is technically configured.


Frequently Asked Questions

What are the best Chinese AI models for coding in 2026?

Kimi K2.6, from Moonshot AI, tied with Xiaomi's MiMo-V2.5-Pro for the highest score among Chinese open-weight models on the Artificial Analysis Intelligence Index in May 2026, each scoring 54 out of a possible 60 for the leading proprietary models. DeepSeek V4 Pro scored 52 on the same index and led Chinese open-weight models on agentic real-world work tasks. Both models priced their APIs below the cost of Western frontier models, though Kimi K2.6's context window of 262,000 tokens is smaller than DeepSeek V4's 1 million token context.

Why are Chinese AI models cheaper than Claude and GPT-5?

Three structural factors explain the price gap. DeepSeek is backed by the High-Flyer hedge fund and has stated it does not need API revenue to remain solvent. MiniMax and Alibaba's Qwen operate inside companies with cloud businesses that can sustain loss-leader pricing for ecosystem reasons. Chinese labs have also developed Mixture-of-Experts architectures that reduce the compute required per inference, lowering the cost basis. The result is prices as low as $0.30 per million input tokens versus roughly $5 per million for Claude Opus — a gap that is most consequential for agentic workflows consuming millions of tokens per session.

Are Chinese AI models safe to use for work?

For non-sensitive general coding tasks, the capability and availability arguments are straightforward. For any workload that involves proprietary source code, customer data, or confidential business information, every Chinese AI model covered here — including Kimi K2.6, MiniMax, DeepSeek, and Zhipu GLM — operates under China's National Intelligence Law, which requires Chinese companies to cooperate with Chinese government intelligence requests. That obligation is not altered by the physical location of servers or by the company's privacy policy. Regulatory bans on government device usage of DeepSeek exist in the United States, Taiwan, South Korea, Italy, and Australia.

What does OpenRouter's $113 million fundraise signal about the AI model market?

OpenRouter's May 26, 2026 Series B, led by Alphabet's CapitalG with participation from NVIDIA and enterprise infrastructure companies including Snowflake, Databricks, and MongoDB, confirmed the platform's position as core routing infrastructure for a multi-model AI market. The announcement disclosed weekly platform volume of 25 trillion tokens — five times higher than six months earlier — reflecting how quickly enterprises are scaling agent-based deployments. The investment group's composition signals that major enterprise infrastructure players consider model-agnostic routing a permanent feature of production AI stacks, not an experimental layer.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:AI
Join the Discussion