Abstract
2025 was a genuine "leap year" in the history of AI large language models in China and globally: the training paradigm shifted from parameter races toward inference efficiency, the commercialization path moved from demonstrations to scaled ARR, and the foundational model layer evolved from oligopoly into a triangular structure of "US–China bipolarity + open-source rise." This report, anchored to June 2026, systematically surveys the full value chain of China's AI large models and applications — spanning six major segments: foundational LLMs, MaaS cloud services, vertical industry models, ToB/ToC applications, AI Agents, AI hardware, and overseas expansion — integrating FY2025 full-year financial data, the latest Q1 2026 developments, and secondary-market performance to deliver a panoramic picture for "seeing the structure, identifying the trend, and locating the opportunity."
Core findings are as follows:
- China's AI large-model market reached approximately RMB 49.5–51 billion in 2025 (covering model services, ToB deployments, and ToC paid applications). In the broader definition that includes AI-enabled software, the estimate approaches RMB 100–130 billion; by 2026, driven by Agent explosion and accelerated AI device penetration, the market could surpass RMB 200 billion.
- DeepSeek's emergence rewrote the global cost curve: its MoE architecture compressed training costs for equivalent capability to 1/10 of conventional approaches, forcing competitors to cut prices across the board and accelerating inference at the edge — the defining event in global LLM structure in 2025.
- A first tier of Chinese foundational models has formed: the four internet giants Baidu ERNIE, Alibaba Qwen, ByteDance Doubao, and Tencent Hunyuan, together with four startup forces — DeepSeek, Zhipu AI, Moonshot AI (Kimi), and MiniMax — with the gap between the two tiers narrowing significantly by end-2025.
- The application layer hit a structural inflection point: total Chinese AI application financing in 2025 reached RMB 107 billion, yet ARR remains broadly below RMB 10 billion. MiniMax's overseas revenue share exceeding 70% and Kimi K2.5's monthly revenue surpassing all of FY2025 in under 20 days are two milestone signals.
- Domestic compute substitution entered material ramp-up: in 2025 the combined Chinese market share of domestically produced AI accelerators reached 41%; Cambricon (688256) H1 revenue grew 4,348% year-on-year; and the Huawei Ascend ecosystem has demonstrated partial substitution capability for large-scale clusters.
- Agents and AI devices are the most certain increments of 2026–2028: enterprise Agent penetration is sprinting from 52% in 2025 toward the 80% target for 2026; AI smartphone shipments in China are projected at 147 million units in 2026; global AI glasses shipments may exceed 30 million units.
一 Definitions, Classification, and the Full Industry-Chain Panorama
1.1 From Generative AI to Large Models: A Rigorous Classification Framework
"AI large model" is a generic term in the Chinese context, covering deep learning models with parameter counts typically above 1 billion (1B), built on the Transformer architecture, and acquiring general capabilities through large-scale pretraining. More precisely, this category can be articulated across three dimensions: capability form, deployment mode, and business model.
By capability form, large models divide into language large models (LLMs), multimodal large models (covering text + image, text + video, text + audio, etc.), and diffusion models (dedicated to image/video generation). The dominant development direction in 2025 is native multimodality — training image, audio, and video tokens together with text tokens from the pretraining stage, exemplified by Google Gemini, Meta Llama 4, and Alibaba Qwen3.
By deployment mode, large models split into API calls (MaaS, Model-as-a-Service), private deployment (enterprises buy compute and self-host), and edge inference (quantized lightweight models running on smartphones, PCs, glasses, and other terminal devices). In 2025, the share of inference in total compute consumption rose from 36% in 2023 to approximately 68%, and is expected to reach 73% in 2026, reflecting the industry's shift of gravity from "training demonstrations" to "inference monetization."
By business model, industry participants can be divided into four tiers:
- Foundational Model Layer: Proprietary super-large pretrained models monetized via API or license; key players include OpenAI, Anthropic, Google DeepMind, Baidu, Alibaba, ByteDance, Tencent, and DeepSeek.
- MaaS / Cloud Services Layer: Cloud platforms providing model calls, fine-tuning, and private deployment, including Alibaba Cloud Bailian, Baidu Cloud Qianfan, Huawei Cloud ModelArts, and Tencent Cloud TI; compute-as-a-service (GPU server leasing) is also included in this tier.
- Vertical Industry Model Layer: Vertical fine-tuning on general foundational models for specific industry scenarios; representative companies include SenseTime Rixin (vision/embodied), iFlytek Xinghuo (speech/education/government), Baichuan Intelligence (healthcare), and 4Paradigm (finance/industry).
- Application Layer: ToB/ToC products facing end users, including office productivity (Kingsoft WPS AI, Feishu Copilot), customer service, marketing, coding (Claude Code, GitHub Copilot, Tongyi Lingma), content creation (Doubao, Kimi), and search (Perplexity).
1.2 Industry Chain Panorama: From Silicon Wafers to the User Interface
The AI large-model industry chain can be broken into six main segments, each deeply interlocking with manufacturing supply chains:
Upstream Compute Infrastructure: GPU/AI accelerator chips are the core means of production, with NVIDIA dominating globally and the domestic camp represented by Cambricon (688256), Huawei Ascend (unlisted), Hygon Information (688041), and MThreads MetaX. GPU server manufacturers include Inspur Information (000977), Dawning Information (603019), Baode, and Dingjia. Data centers require liquid-cooled servers, optical modules, and high-speed interconnect switching equipment.
Training / Inference Compute Services: Intelligent computing centers are operated by Alibaba Cloud, Tencent Cloud, Huawei Cloud, Baidu Cloud, and numerous local-government-led compute bases; AI server leasing and elastic inference services are the core product forms.
Foundational Models and Pretraining: The most compute-intensive segment; parameters range from tens of billions to trillions, and training a single large model costs tens of millions to hundreds of millions of dollars. Open-sourcing (DeepSeek, Llama 4, Qwen3) is continuously lowering entry barriers.
MaaS and Development Platforms: The middleware layer connecting models to applications; provides API calls, vector databases, RAG toolchains, agent frameworks, and parameter fine-tuning services — the densest ecosystem layer for enterprise developers.
Vertical Industry Models and AI Applications: Facing end-use scenarios directly; monetization takes the form of SaaS subscriptions, API-call billing, hardware bundles (smart glasses/AI phones), project-based implementation, and outcome sharing.
Edge AI Devices: Includes AI smartphones, AI PCs, AI glasses, AI earphones, and other consumer electronics; requires edge chips (accelerator cards — Apple A18 Pro, Qualcomm Snapdragon, MediaTek Dimensity 9400, etc.) while driving demand for components such as thermal management equipment and storage devices.
1.3 Generational Comparison with Traditional Internet / Mobile Internet
To understand the commercial logic of the AI large-model industry, a horizontal comparison with the past two platform revolutions is necessary. In the Internet era (1995–2008), the core resources were bandwidth and traffic, business models centered on advertising and transaction commissions, and entry barriers were relatively low (websites could be built cheaply). In the mobile Internet era (2008–2020), core resources were mobile operating systems and user attention; the business model was centered on the ecosystem monetization of super-apps (WeChat, Alipay, Meituan), with entry barriers built jointly by distribution channels and network effects.
In the AI large-model era, core resources are compute (chip clusters needed for training and inference), data (high-quality pretraining corpora and instruction fine-tuning data), and model capability (leadership on various evaluation benchmarks). The acquisition cost of all three resources runs to billions of RMB, far exceeding the entry cost of the previous two eras, which naturally pushes the foundational model layer toward high concentration — globally fewer than twenty players can sustain investment in training super-large foundational models. Yet simultaneously, the open-source wave makes copying a model with basic capabilities extremely cheap, forming an interesting paradox: extreme concentration at the top, extreme openness in the long tail.
This structure means that in the mid-stream (vertical industry models, application tools, enterprise private deployment) and downstream (end-user scenarios) of the value chain, the competitive landscape is far more fragmented and dynamic than the foundational model layer, with more window opportunities for medium-sized enterprises to earn outsized returns.
Implications for manufacturing enterprises: Manufacturing enterprises do not need — and cannot — become "AI large-model companies," but they do need to become "deep users of AI tools." Procurement decisions should prioritize: mature vertical AI solutions (already validated by peers, low risk), vendors with complete private deployment options (data security), and projects with clear ROI calculation models (production scheduling optimization, automated quality inspection, sales lead screening). The execution speed of "rapid pilot, rapid validation, rapid scale-up" in AI procurement will become one of the key variables differentiating manufacturing enterprise competitiveness in 2026–2028.
Implications for AI startups: Although competition is fierce in the mid-stream vertical model layer and downstream application layer, the ways of building moats are completely different from the foundational model layer — the foundational model layer wins on compute and data scale, while the application layer wins on scenario depth, user stickiness, and data flywheels. Startups should choose a sufficiently vertical niche (not just "legal AI" but "contract review AI," not just "industrial AI" but "textile quality inspection AI"), achieve excellence within that niche, accumulate proprietary data assets not easily replaced by general large models, and then expand into adjacent scenarios.
Implications for policymakers: The mid-stream MaaS platform and enterprise AI solution layer is where policy support can most "precisely irrigate" — here are concentrated the most medium-sized tech enterprises that have technical capability but lack the funding scale of top platforms. Through compute subsidies, application scenario demonstrations, and open data sharing, governments can directly lower the commercialization threshold for these enterprises, thereby accelerating the diffusion speed of the entire AI industry.
1.4 The Technical Evolution Path of Large Models: From GPT-1 to Super-Large Multimodal
The technical lineage of large models can be traced to Google's BERT (Bidirectional Encoder Representations from Transformers) and OpenAI's GPT (Generative Pre-trained Transformer), both released in 2018. Evolution proceeded along two parallel main lines: first, scale expansion — GPT-1 at 117 million parameters, GPT-2 at 1.5 billion, GPT-3 at 175 billion, and GPT-4 estimated at over one trillion (using a mixture-of-experts architecture); second, capability expansion — from pure text generation to code generation, image understanding, speech recognition, and video generation in multimodal fusion.
2023–2025 was the most intensive phase of this evolution. ChatGPT's phenomenal explosion in early 2023 pushed Chinese internet companies to collectively announce large-model strategies; reasoning models (slow-thinking mode centered on chain-of-thought) emerged as a dark horse in 2024; in 2025, DeepSeek-R1's low-cost, high-performance proved the viability of the engineering efficiency route; in the same year, Alibaba open-sourced Qwen3, Google released Gemini 3, and Meta launched Llama 4 with a native multimodal architecture — marking the full formation of the fourth generation of large models.
The direct impact of this technical evolution is that flagship large models in 2025 have, in aggregate capability, exceeded the early 2024 expert predictions, with particularly outstanding performance on "hard problems" such as math competitions, code writing, and scientific reasoning. In some professional domains, partial capabilities can rival top specialists.
二 Global Landscape and Major Overseas Players
2.0 Overview of the Global AI Large-Model Landscape in 2025
Before entering in-depth analysis of individual players, we establish a panoramic framework for the global AI large-model landscape in 2025. The following table ranks the top ten global AI entities by composite assessment of technical capability, commercial scale, and ecosystem influence (as of June 2026):
| Rank | Company / Organization | Main Model | Annualized Revenue (Estimate) | Core Differentiation |
|---|---|---|---|---|
| 1 | OpenAI | GPT-5 / o-series | ~$25 billion | Broadest consumer penetration + enterprise API market share |
| 2 | Anthropic | Claude 4.x series | ~$47 billion (May 2026 annualized) | Enterprise safety and reliability + Claude Code coding tool |
| 3 | Google DeepMind | Gemini 3.5 series | Hard to split out (cloud integration) | Search ecosystem synergy + frontier scientific AI |
| 4 | Meta | Llama 4 series | Indirect monetization (ad ecosystem) | Setter of global open-source ecosystem standards |
| 5 | xAI | Grok-3 series | ~$1–1.5 billion | X platform real-time data + Musk brand effect |
| 6 | Alibaba | Qwen3 series | Alibaba Cloud AI contributes ~RMB 40 billion | #1 open-source ecosystem by global downloads |
| 7 | DeepSeek | V3 / R1 series | ~RMB 10–15 billion | Cost-curve disruptor + open-source technical influence |
| 8 | ByteDance | Doubao series | Volcano Engine AI contributes hundreds of billions | 250 million MAU ToC traffic + inference compute scale |
| 9 | Baidu | ERNIE 5.0 series | Cloud AI ~RMB 20 billion | China's earliest AI commercializer + PaddlePaddle ecosystem |
| 10 | Microsoft | Azure OpenAI / Copilot | Hard to split out (Azure integration) | Widest enterprise Office 365 Copilot penetration |
This landscape presents two distinct characteristics: first, the top positions are occupied by companies leading in both "commercialization + research investment" (Anthropic's revenue surpassing OpenAI's annualized rate being one of the most surprising market developments of 2026); second, Chinese companies hold three of the top-ten slots (Alibaba, DeepSeek, ByteDance) — a qualitative leap from the "zero slots" in 2023.
2.1 The New Three-Polar Structure: "US–China Bipolarity + Open-Source Third Pole"
Before 2024, the global large-model landscape could simply be described as "OpenAI leading, Google chasing, China following." In 2025, this was completely disrupted by the release of DeepSeek-R1, forming three poles: the US closed super-large-parameter model pole (OpenAI/Anthropic/Google), the Chinese engineering efficiency pole (DeepSeek/Alibaba Qwen/ByteDance Doubao), and the global open-source ecosystem pole (Meta Llama/DeepSeek/Qwen/Mistral). The core logic of the three-polar structure is: the engineering efficiency pole proved that "equivalent capability can be achieved at lower cost," forcing the closed super-large-parameter model pole to be compelled to cut prices commercially, while the global open-source ecosystem pole gives developers truly usable alternatives.
The formation of the three-polar structure carries deep industry implications. Previously, OpenAI used GPT-4's generational advantage to build a "closed-source monopoly" logic — the best models could only be accessed via its API, requiring payment for the strongest capabilities. DeepSeek's emergence dismantled the foundation of this logic: when open-source model capability reached "practical equivalence," the premium space for closed-source models shrank sharply. The beneficiaries of this price war are the tens of millions of small and medium enterprises and individual developers who can now obtain, at nearly zero marginal cost, AI capabilities that would previously have required paying tens of thousands of dollars in API fees.
From a macro perspective, behind the three-polar structure is the collision of two completely different technical philosophies: top US players tend to seek absolute capability breakthroughs through investments in super-large compute (scale-law absolutism), while top Chinese players seek engineering efficiency maximization under compute constraints (engineering optimization). Both philosophies made major advances in 2025, with no victor between them — they jointly define the two parallel tracks of large-model technical evolution in the coming years.
2.2 OpenAI: GPT-5 and Annual Revenue Surpassing $20 Billion
OpenAI officially released GPT-5 on August 7, 2025 — a unified system combining "fast response + deep reasoning" — with inference workloads increasing 8× in one week. By end-2025, OpenAI's annualized revenue surpassed $20 billion, reaching monthly revenue of $2 billion; entering 2026, the annualized revenue run rate rose to approximately $25 billion, with enterprise and commercial revenue accounting for more than 40%. Concurrently, OpenAI's share of enterprise AI spending fell from 50% in 2023 to 27%, while Anthropic rose to 40% and Google to 21%, with the three combined at 88%, reflecting the intensely competitive top-three race.
Key details worth understanding in OpenAI's business model:
Revenue structure's dual-track nature: OpenAI's revenue sources split into consumer subscriptions (ChatGPT Plus/Pro/Team, $20–$200/month per seat) and enterprise API (per-token B2B billing). In 2025, enterprise API revenue growth outpaced consumer subscriptions, reflecting the market's migration from individual trial to formal enterprise procurement. ChatGPT as the consumer traffic giant (approaching 700 million MAU in 2026) serves as a free customer-acquisition channel for OpenAI's brand and API revenue — a key link in the commercial model's closed loop.
Cost structure challenges: Behind OpenAI's rapid growth is continued losses — estimated net loss for full-year 2025 of approximately $5–7 billion, mainly from super-large compute investment (training costs for frontier models like GPT-5 running into hundreds of millions to billions of dollars), inference costs for Operator/Agent services, and ongoing researcher compensation. OpenAI's break-even point depends on continued expansion of enterprise API customers and improvements in unit economics.
Non-profit to for-profit organizational restructuring: Throughout 2025, OpenAI continued the complex legal process of transforming its original non-profit parent structure into a for-profit company-controlled framework, to satisfy the requirements of potential IPO investors and allow employee shares to trade. This transformation was substantially completed by early 2026, paving the way for a valuation assault on trillion-dollar territory (market projections suggest post-IPO valuations could exceed $200–500 billion).
2.3 Anthropic: Claude 4.x and Valuation Sprint Toward $1 Trillion
Anthropic experienced rare acceleration in fundraising in 2025–2026: September 2025 at $183 billion valuation completing $13 billion Series F; February 2026 at $380 billion valuation completing $30 billion Series G; June 1, 2026 secretly filing for IPO at $965 billion valuation while completing $65 billion Series H. The revenue side likewise exploded: annualized revenue approximately $9 billion by end-2025, surpassing $47 billion annualized by May 2026.
At the product level, Claude Code officially launched commercially in May 2025, reaching $1 billion annualized revenue by November 2025 and $2.5 billion annualized by February 2026 — one of the fastest-growing commercial products in the coding tools segment. The Claude series continued iterating: Claude 4.5, Claude Sonnet 4.6 (input $3/million tokens, output $15/million tokens), and Claude Opus 4.7 were released in succession, continuously leading in reasoning, Computer Use, and multimodal capabilities.
Anthropic's core positioning is as the standard-bearer of "safe AI" — a commercial strategy as much as a foundational corporate culture. The founding team came from OpenAI (Dario Amodei and Daniela Amodei left to found it), with core concerns around "how to ensure increasingly powerful AI systems act in directions humans desire." Constitutional AI (training models to learn a set of value principles and self-evaluate against them) and extensive interpretability research are the technical underpinnings of this positioning.
Anthropic's share of enterprise AI spending jumped from approximately 10% in 2023 to 40% in 2025, surpassing OpenAI as the top choice for enterprise procurement, driven by two core reasons: first, Claude's "long context + low hallucination rate + high safety" combination is reliably excellent in enterprise mission-critical scenarios (legal contract review, financial report analysis, medical record processing); second, Claude's System Prompt feature allows enterprises to customize the behavioral boundaries of AI assistants (e.g., "only answer questions related to our company's products"), making Claude the preferred foundational model for building enterprise-internal dedicated AI assistants.
The implication for Chinese AI enterprises: Anthropic's success demonstrates that in global competition, the positioning of "safe + reliable + controllable" captures enterprise paying customers' hearts more effectively than "most capable" — an insight with direct reference value for Chinese LLM companies' international strategy design.
2.4 Google DeepMind: Gemini 3.5 and Scientific AI Layout
Google DeepMind released the Gemini 3 series on November 18, 2025, followed by Gemini 3.1 Pro Preview, and in 2026 released the Gemini 3.5 series (starting with 3.5 Flash), claiming to "combine frontier intelligence with actionability," focusing on long-horizon Agent tasks. Gemini Deep Think achieved gold-medal level on International Mathematical Olympiad (IMO)-grade problems, with reasoning capability reaching world-class scientific research standards.
Google's advantage lies in synergy with the search and advertising ecosystem: AI Mode (formerly SGE) embeds Gemini directly into search results pages, with hundreds of millions of daily active users and the shortest monetization path. Meanwhile, Google Cloud AI revenue grew rapidly throughout 2025, with GCP's share of enterprise AI spending rising to 21%. On the open-source side, the Gemma series continues updating, providing official support for low-cost edge deployment.
The unique challenge Google faces in AI: as one of the most successful companies of the Internet era, its core business model (search advertising) is being directly threatened by AI. The rise of Perplexity, ChatGPT Search, and Gemini's search mode lets users obtain information directly from AI without clicking on ad links — structurally damaging Google's search advertising model. Google must make a difficult choice between "using AI to cannibalize its own search advertising" and "not using AI and being cannibalized by competitors." In 2025, Google chose the path of "embracing AI Mode," positioning AI search as an upgrade to search experience rather than a replacement, and exploring new ad formats embedding sponsored content in AI search results.
Google's deep advantage lies in its incomparable data assets: billions of daily search queries, billions of daily video views on YouTube, and extensive user behavioral data in Gmail and Workspace — data that is priceless raw material for training the next generation of multimodal AI, and that no competitor can replicate in the short term. DeepMind's scientific AI direction (AlphaFold 3 in protein structure prediction, AlphaMath in mathematical theorem proving, AlphaGeometry in geometric reasoning) represents Google's long-term investment in pushing AI capabilities to the frontier of human science — with a longer commercialization path than the application layer, but representing the very frontier of AI capability expansion.
2.5 Meta: Llama 4's Open-Source Dominance
Meta released Llama 4 on April 5, 2025, introducing native multimodality + MoE architecture for the first time: Scout (17 billion active parameters / 109 billion total parameters, 10 million token context) and Maverick (17 billion active parameters / 400 billion total parameters) have both been open-sourced; the super-large Behemoth (288 billion active parameters / 2 trillion total parameters) was still training as of June 2026. Llama 4's download numbers on Hugging Face accelerated the moment when Chinese open-source models surpassed American ones — in July 2025, models dominated by Qwen from China first exceeded American models in monthly download volume.
Meta's strategic focus in AI is "building ecosystem moats through open-source": embedding Llama into WhatsApp, Instagram, and Messenger (platforms with more than 4 billion combined MAU), improving advertising monetization efficiency via AI features while attracting global developers through open-source to contribute vertical fine-tuned models to Meta's AI ecosystem.
Meta's open-source AI strategy has unique counter-intuitive aspects: on the surface, open-sourcing Meta's strongest models seems to "give weapons to competitors for free"; at a deeper level, Meta's main revenue is advertising, not AI services. By open-sourcing Llama, Meta achieves three business objectives: first, establishing position as the standard-setter for the global open-source AI ecosystem, attracting developers to build applications on Llama and then distribute content and reach users on Meta's advertising platforms; second, by giving competitors free powerful tools, accelerating the development of the entire AI application ecosystem and indirectly expanding the market size of AI-driven digital advertising; third, winning regulatory friendliness through open-sourcing — compared to OpenAI's "black box" closed-source models, Llama's openness enjoys a natural advantage in EU AI Act compliance reviews.
In January 2026, Meta's decision to acquire Chinese AI startup Manus (Butterfly Effect) for over $2 billion sent another important signal beyond the open-source strategy: Meta began rapidly acquiring teams with leading Agent capabilities through M&A, integrating Manus's "multi-step task autonomous execution" technology into the Meta AI ecosystem to synergize with Llama's foundational capabilities. This was one of the most important M&A deals in AI in 2026, signaling that Agent capability will become a standard feature of mainstream AI platforms.
2.5.1 NVIDIA's Compute Moat: Dual Barriers of Hardware and Software
NVIDIA's competitive advantage in the AI era far exceeds the surface label of "GPU manufacturer." Understanding the depth of its moat is crucial to predicting the long-term structure of the AI industry:
Hardware moat: advanced manufacturing process moat. NVIDIA's latest Blackwell/Rubin GPUs use TSMC's N4/N3 process, relying on TSMC's decades of accumulation in advanced processes like extreme ultraviolet lithography (EUV) and atomic layer deposition (ALD). NVIDIA itself does not own fabs (fabless model), but has formed the highest-priority strategic partnership with TSMC — in times of capacity tightness, NVIDIA can obtain the highest-priority allocation of TSMC's advanced process capacity. This relationship itself is a competitive moat: other GPU design companies (AMD, Intel) are at a relative disadvantage in their priority competition with TSMC.
Software moat: 20 years of CUDA ecosystem accumulation. CUDA (Compute Unified Device Architecture) is the GPU parallel computing platform introduced by NVIDIA in 2006, with nearly 20 years of ecosystem accumulation: over 4 million active CUDA developers worldwide, millions of optimized operators and pretrained models, and a complete toolchain built on CUDA (cuDNN neural network acceleration library, cuBLAS linear algebra library, NCCL collective communication library). Any solution attempting to replace NVIDIA GPUs must provide a "sufficiently complete" CUDA compatibility layer — requiring years of engineering time and with the target continuously moving as NVIDIA updates. This is the most core non-hardware challenge facing Huawei Ascend and Cambricon.
Ecosystem moat: vertical integration from chips to systems. Through NVIDIA DGX systems (AI training servers integrating multiple GPUs, high-speed interconnects, and memory), NVIDIA Networking (InfiniBand high-bandwidth interconnect network, obtained through the Mellanox acquisition), and NVIDIA NIM (inference microservices), NVIDIA has achieved vertical integration from "chips → systems → software services." This means that after purchasing NVIDIA GPUs, customers get a complete out-of-the-box AI training and inference solution, not just a chip. This all-in-one experience greatly reduces the engineering threshold for large enterprises to deploy AI clusters, and deepens customer dependence on the NVIDIA ecosystem.
2.6 xAI and Mistral: Rising Forces
xAI released Grok-3 in February 2025, emphasizing enhanced reasoning capabilities, leveraging the massive real-time data from the X (Twitter) platform to build differentiation. In January 2026, xAI completed a $20 billion Series E at $230 billion valuation; after subsequently merging with SpaceX, the combined valuation was approximately $1.25 trillion. Grok MAU reached 117 million in March 2026, with the US chatbot market share rising to 17.8%, becoming the third largest platform after ChatGPT and Gemini.
France's Mistral AI won recognition from European enterprise developers with its small but precise open-source model strategy, with valuation entering the $6 billion tier in 2025; in the context of the EU AI Act taking effect, the compliance advantage of domestic AI suppliers is increasingly prominent.
xAI's competitive advantage and strategic value are built on a unique asset: the hundreds of millions of real-time posts generated daily on the X (formerly Twitter) platform are a scarce resource for training large models with "current knowledge." Traditional large models have "knowledge cutoff date" limitations, while Grok, relying on X platform's real-time data feed, can process queries about news events that happened minutes ago, giving it an advantage in breaking news understanding and real-time market dynamics analysis that other AI cannot replicate.
Mistral's strategic position is equally unique: as the only European company in the global top ten LLM players, Mistral carries the political needs of Europe's "AI autonomy strategy" — both the French government and the European Commission have strong intentions to support domestic AI champion companies to reduce strategic dependence on US tech companies. In 2025, Mistral received direct investment from the EU Digital Sovereignty Fund and obtained priority procurement status in multiple European government procurement projects. On the business model, Mistral adopts an open-source + enterprise services hybrid model (similar to Red Hat vis-à-vis Linux), generating commercial revenue by providing enterprise-grade SLAs, private deployment support, and custom fine-tuning services on top of the open-source Mixtral.
2.7 NVIDIA: Blackwell Ignites the Compute Super-Cycle
NVIDIA's FY2025 (ending January 2025) full-year revenue was $130.5 billion, growing 114% year-on-year, with the Blackwell series GPU as the core driver. Q2 2025 single-quarter revenue was $46.7 billion (+56% YoY), Q3 rose to $57 billion, Q4 reached $68.1 billion (+73% YoY); Q1 2026 guidance was $78 billion, with combined Blackwell + Rubin revenue visibility for the full year at approximately $500 billion.
Blackwell's core technical breakthroughs include: substantially improved HBM3e memory bandwidth, doubled NVLink bandwidth to support 10,000-card-scale cluster interconnection, and FP4-precision inference increasing single-card inference throughput approximately 4× versus H100. NVIDIA also built software moats beyond hardware through CUDA ecosystem, Omniverse, and NIM (NVIDIA Inference Microservices), making it the preferred AI training compute platform for the foreseeable future.
NVIDIA's situation in the Chinese market is extremely peculiar: on one hand, China is one of its largest single-country markets (accounting for approximately 17% of total revenue before sanctions), with this share having declined significantly after the ban on exporting H100/A100; on the other hand, the H20 as a "reduced-spec" compliant export chip specifically for the Chinese market (peak compute approximately 22% of H100, but with communication bandwidth unrestricted), during its brief supply suspension and resumption in April–July 2025, fully exposed Chinese AI enterprises' deep dependence on NVIDIA GPUs.
NVIDIA's China strategy faces a fundamental contradiction: completely exiting the Chinese market would cause significant revenue loss (and also give space to domestic GPU competitors), but any move aimed at expanding supply to China faces political pressure from the US government in the context of continuously tightening US export controls on China technology. Jensen Huang's stance is to "maintain China business to the extent permitted within compliance," reflected in his proactive visits to China to meet government officials and enterprise customers on multiple public occasions, and in pushing for rapid H20 license restoration. This delicate geopolitical game will continue to affect China's AI compute supply landscape for years to come.
For China's AI industry, NVIDIA's dilemma is an important historical window for domestic GPU breakthroughs — each NVIDIA supply interruption or tightening of restrictions pushes Chinese AI vendors to accelerate domestic substitution validation, accumulating stability data and optimization experience for domestic GPUs under real production workloads, laying the foundation for larger-scale domestic substitution in the future.
三 PEST Analysis: Macro Environment and Regulatory Framework
3.1 Political and Regulatory: Dual Track of Registration System + Safety Standards
China has built the world's most complete regulatory system specifically for generative AI. The Interim Measures for the Management of Generative Artificial Intelligence Services took effect on August 15, 2023, establishing a three-in-one regulatory framework of "registration access + safety assessment + content review." As of December 31, 2025, 748 generative AI services nationwide had completed registration and 435 AI applications or functions had completed registration, with Beijing leading at 144 (33%).
In April 2025, the Basic Safety Requirements for Generative Artificial Intelligence Services was formally elevated to a mandatory national standard, with specific quantitative requirements across dimensions of model safety testing, data annotation specifications, and user agreement disclosure. In December 2025, the Cyberspace Administration published for public comment the Interim Measures for the Management of AI Anthropomorphic Interaction Services, intending to standardize identity disclosure and emotional dependency boundaries for human-like AI assistants, expected to formally take effect in H1 2026.
US sanctions and export controls are another major political variable. In April 2025, the Trump administration suddenly suspended NVIDIA H20 exports to China, causing NVIDIA to write off approximately $5.5 billion in losses; but on July 15, 2025, NVIDIA CEO Jensen Huang announced that the US government approved export licenses and H20 resumed sales to China. In March 2025, the US Commerce Department added more than 50 Chinese AI and supercomputing companies to the Entity List, with the compute blockade of China continuing to advance. The EU AI Act formally took effect in 2025, with high-risk AI applications subject to strict regulation, creating compliance challenges for Chinese AI products going overseas to Europe.
Deep logic of China's regulatory system: Understanding Chinese AI regulation should not be viewed merely as "government control" but should see its policy-tool attributes clearly. The registration system is essentially a market access licensing mechanism, but unlike the EU AI Act's "risk-level tiering" approach, China uses "mandatory registration by service type + preemptive content safety review." The design logic is: AI services must pass safety assessment before being offered to the public, filtering potential social harms before they enter the market.
Industrial shaping effect of domestic substitution policies: The government's AI domestic substitution procurement policy is another key political force shaping the market structure. In 2025, the domestic production rate requirement for AI software procurement by government agencies at all levels rose from 70% to 80%–90%, explicitly excluding overseas AI large-model services. This policy directly determines that iFlytek, Baidu, and Huawei AI in the government market face almost no overseas competitors, forming a unique "government AI domestic oligopoly structure" in sharp contrast to the more competitive commercial AI market.
3.2 Economic: Cost Decline and Accelerated Commercialization in Resonance
The exponential decline in training costs is the most prominent economic signal of 2025: DeepSeek-V3's training cost was reported at approximately $5.6 million, only about 1/20th of an equivalent-capability GPT-4-class model. This cost curve change directly drove two economic effects: first, API call prices entering the "cents era," with DeepSeek pricing only 1/10th of OpenAI, leading the entire industry to proactively cut prices; second, a large number of small and medium enterprises that previously could not afford AI deployment began trying AI, significantly expanding the potential scale of the enterprise market.
China's government investments in the compute side scaled up simultaneously: the "AI+" action plan in 2025 drove compute base construction in provinces, with the number of local government-led intelligent computing centers exceeding 200, forming a dual-track compute supply with commercial cloud providers. China Mobile's purchase of 7,499 inference-type AI servers at the start of 2026 in a single order exceeding RMB 5 billion is a representative event of inference compute investment scaling up.
Looking at financing structure, total financing in China's AI large model and application sector throughout 2025 exceeded RMB 100 billion, with foundational large models (Moonshot AI, MiniMax, Zhipu, Stepfun) accounting for approximately 30%, the application layer (industrial AI, medical AI, legal AI, coding tools) approximately 45%, and compute infrastructure approximately 25%.
3.3 Social: ToC Applications from "Novelty" to "Dependency"
Doubao's 250 million MAU (three ByteDance AI apps combined), Tencent Yuanbao's MAU exceeding 100 million, Kimi's K2.5 monthly revenue surpassing all of FY2025 in under 20 days — these data points together indicate a structural change: Chinese consumer users have moved from curiosity-driven "novelty phase" to demand-driven "dependency phase," with membership subscriptions and API payments gradually becoming regular expenditure items for users.
Education scenarios are the fastest-monetizing consumer vertical: iFlytek AI learning machine revenue doubled in 2025, and AI tutoring products with large models built high-stickiness parental payment habits. Coding tools are the fastest-commercializing scenario globally: intelligent coding assistants collectively pushed "AI programming assistants" to become standard tools for software developers, with paid penetration estimated above 20% in 2025.
3.3.1 Talent Flows and the "AI Talent Spillover Effect"
The impact of AI large models on China's labor market has in 2025 moved beyond theoretical discussion to a genuinely quantifiable stage. This impact is occurring simultaneously in two directions: creating new positions (AI engineers, prompt engineers, AI product managers, AI safety researchers) while replacing traditional positions (call center agents, basic data annotation, simple content writing, repetitive code work).
AI talent spillover effect: Top AI researchers leaving academia and major tech companies are accelerating startup formation, creating a talent aggregation effect of "1 Tsinghua AI PhD → 1 AI startup → bringing 5–10 undergraduate graduates along." The number of AI startups in Beijing and Shanghai with more than 50 employees exceeded 500 in 2025, with approximately 70% of technical core teams having backgrounds from Tsinghua/Peking University computer science, Tsinghua AI Institute, Baidu, or ByteDance.
Structural differentiation in AI substitution effects: Not all positions face equal AI substitution risk. Research institutions' classification of "AI substitution risk" positions roughly breaks down as: high-risk positions (>70% replacement probability within 10 years) include telemarketing, junior accounting, basic data entry, simple image processing, and standard contract review; medium-risk (30%–70%) include junior legal assistants, news writing, junior programmers, and customer service supervisors; low-risk (<30%) include creative directors, complex engineering decisions, senior management, healthcare workers, and teachers (personalized tutoring).
Talent pricing and recruitment competition: In 2025, the median salary for top domestic AI algorithm engineers (with top lab background + 2+ years of large-model training experience) was approximately RMB 1–3 million per year, with some top candidates (with DeepSeek/Anthropic/OpenAI experience) negotiating at RMB 5–10 million/year. This salary level is equivalent in RMB to top Silicon Valley AI companies, meaning that competitive intensity in China's AI talent market has approached the world's highest level.
3.4 Technology: Three Paradigms of MoE + Reasoning + Multimodal Reconstruct
The main technical themes of 2025 can be summarized in three keywords: Mixture-of-Experts architecture (activating only a subset of expert sub-networks per forward pass, greatly expanding total model parameters while controlling actual computation — DeepSeek V3 with 685 billion total parameters and 37 billion active parameters as the representative), reasoning models (using reinforcement learning to enable models to "think" explicitly before generating answers, with performance greatly exceeding previous generations on math, code, and scientific reasoning), and edge quantization (compressing 7B–70B parameter models to run on smartphone NPUs or laptop computers, driving continuous reduction in inference costs).
The synergistic effect of these three paradigms is: models become "large yet efficient" — the broad knowledge brought by trillion-level total parameters, combined with inference efficiency from streamlined active parameters, combined with the ubiquitous deployability brought by edge quantization. This made flagship large models in 2025 improve approximately 40–60% in capability versus 2024 (mainstream evaluation benchmarks), while inference costs actually fell 30–50% due to efficiency improvements, exhibiting the rare characteristic of "dual-direction improvement in performance-to-price ratio."
Particularly worth noting is that the synergistic co-evolution of the three technical paradigms has catalyzed a category of capabilities that had never appeared before — long-horizon multi-step reasoning. This refers to models that, when facing complex tasks requiring dozens of steps and hours of actual execution time (such as complete software project development, scientific research spanning multiple databases, business analysis requiring more than a dozen specialized tools), can autonomously plan steps, execute actions, detect errors, and adjust direction without humans intervening at each step. This capability is precisely the core prerequisite for Agent technology, and also explains why the global AI industry collectively pivoted from the "large models" topic to "Agent" in 2025.
Looking at technical evolution path predictions, the most important frontier directions for 2026–2028 include: first, controllability of Test-Time Compute — the current reasoning model's "thinking" process is a black box to users; how to let users control the depth and direction of "thinking" (saving cost for fast answers, investing more compute for complex problems) is the next engineering priority; second, embodiment of World Models — video generation models are evolving from "generating beautiful videos" to "simulating the physical world," and if they can accurately simulate physical laws, they will become general foundations for robot training, autonomous driving simulation, and industrial design validation; third, persistent memory architecture — current large models' memory is limited to a single conversation's Context Window (up to approximately one million tokens); building cross-session persistent memory is the key technical chasm separating AI assistants from "tools" to "companions."
四 China Market Scale: Layered Breakdown and Competitive Concentration
4.1 Three Methodological Challenges in Measuring the Large-Model Market
Before entering data analysis, it is necessary to clarify the three major methodological challenges in accurately measuring AI large-model market scale, otherwise numbers from different sources will confuse readers.
Challenge One: Blurred Boundaries. AI large models are both independent products (direct subscription to ERNIE, Kimi, etc.) and embedded functionality within other software products (how much of WPS AI's MAU actually pays for AI? How much of Kingsoft Office's revenue comes from AI premium? These numbers are difficult to precisely separate).
Challenge Two: Valuation of Free Traffic. Doubao, Yuanbao, and ERNIE Yi all implemented varying degrees of free strategies in 2025; vast numbers of users use large-model capabilities at zero cost.
Challenge Three: Attribution of Industrial Multiplier Effects. If a manufacturing enterprise saves RMB 5 million in labor costs by adopting an AI quality inspection system, does that RMB 5 million in "AI-driven value" count toward AI market scale? The answer depends on the researcher's measurement scope.
4.1 Market Scale: Comparison of Three Measurement Scopes
Narrow scope (pure model MaaS): API call revenue and model licensing fees; estimated at RMB 8–10 billion in 2025.
Medium scope (directly related large-model revenue): Includes MaaS + AI application SaaS subscriptions + AI hardware AI function premium; estimated at RMB 49.5–51 billion in 2025, in basic agreement with projections by 36Kr Research Institute and CIR, expected to exceed RMB 70 billion in 2026.
Broad scope (AI-enabled industry revenue): Includes all incremental revenue brought by AI; approaches RMB 100–130 billion in 2025, with potential to exceed RMB 200 billion in 2026.
This report primarily uses the medium-scope approximately RMB 50 billion baseline.
4.2 Sub-Segment Breakdown
| Sub-Segment | 2025 Scale Estimate | 2026 Expectation | Representative Players |
|---|---|---|---|
| Foundational Model MaaS | RMB 8–10 billion | RMB 18–20 billion | Baidu Qianfan / Alibaba Bailian / ByteDance Volcano / Tencent TI |
| ToB Industry Solutions | RMB 15–20 billion | RMB 30–40 billion | SenseTime / iFlytek / 4Paradigm / Zhipu |
| ToC AI Application Subscriptions | RMB 8–12 billion | RMB 20–30 billion | Doubao / Kimi / MiniMax / Yuanbao |
| AI Office / Coding Tools | RMB 5–8 billion | RMB 12–18 billion | WPS AI / Tongyi Lingma / Claude Code |
| AI Hardware AI Premium | RMB 5–8 billion | RMB 15–25 billion | iFlytek AI Learning Machine / Xiaomi AI Phone / AI Glasses |
| Agent Enterprise Deployment | RMB 3–5 billion (early stage) | RMB 10–15 billion | Manus / Baidu ERNIE Agent / ByteDance Coze |
4.3 Competitive Landscape: CR10 and Market Concentration
The foundational model layer is highly concentrated: the four internet giants (Baidu, Alibaba, ByteDance, Tencent) account for approximately 65%–70% of MaaS call volume combined; adding DeepSeek (approximately 10%) and the three firms Zhipu/Moonshot/MiniMax (approximately 10% combined), CR10 is approximately 85%–90%.
The application layer competitive landscape is more fragmented: in the AI office segment, Kingsoft WPS AI dominates (AI MAU 29.51 million); AI search features Mita AI/Kimi/Doubao as main players; AI content generation tools present a blossoming variety. The Agent segment entered its first year of commercialization in 2025 with top concentration yet to form.
Competition has evolved from "one-dimensional" (who has the strongest model capability) to at least five dimensions of multi-dimensional competition:
Dimension 1: Model capability (Technology Race). Rankings on standard benchmarks still matter, but are no longer the sole determining factor. More critical is performance in real user tasks (Chatbot Arena human blind tests) and specialized capabilities in specific vertical scenarios.
Dimension 2: Ecosystem stickiness (Ecosystem Lock-in). After "good enough" model capability becomes a universal condition, ecosystem stickiness becomes the most core competitive moat. Alibaba Cloud Bailian, with the breadth of 200+ models and deep integration with the Alibaba e-commerce ecosystem, has a significant advantage.
Dimension 3: Commercialization speed (Go-to-Market). Converting model capability into enterprise customer ARR quickly depends on the sales team's execution, industry-specific customization capabilities, and the speed of accumulating successful customer case studies.
Dimension 4: Data flywheel (Data Flywheel). Having the most real users (Doubao 100 million MAU, ChatGPT approximately 700 million MAU) means continuously generating large amounts of user preference signals for reinforcement learning.
Dimension 5: Compute autonomy (Compute Independence). The ability to ensure stable supply for training clusters amid escalating US–China compute confrontation directly affects model iteration speed.
五 In-Depth Industry Chain Analysis
5.1 Compute Infrastructure (Linked with Data Center AI Compute Special Report)
AI large models rely on two types of compute: training compute (one-time, highly concurrent, requiring 10,000+ card interconnection) and inference compute (sustained, low-latency, high per-card efficiency requirements). In 2025, inference compute's share rose to approximately 68%, driving AI server configurations from "training-first" to "inference-first."
Key changes in the compute infrastructure value chain:
Change 1: Mainstream adoption of super-large clusters. In 2022, a 1,000-GPU training cluster was already top-tier; by 2025, leading AI labs' standard training clusters reached 100,000 GPUs or more. Super-large clusters brought new engineering challenges: as cluster scale grows 10×, all-reduce communication bandwidth requirements grow quadratically, continuously upgrading network switch and optical module bandwidth specs from 100G and 400G to 800G and 1.6T.
Change 2: PUE optimization becomes central to AI data centers. Traditional commercial data center PUE is approximately 1.3–1.5; new AI data centers require PUE not exceeding 1.25, with some super-computing centers using liquid cooling achieving PUE approaching 1.1. Data center cooling and data center energy efficiency technologies have therefore become the highest-certainty niches in AI infrastructure investment.
Change 3: Specialized differentiation of inference infrastructure. AI inference demand characteristics (high concurrency, short latency, cost-per-token priority) make inference cluster hardware configurations significantly different from training clusters — greater emphasis on per-card memory capacity (KV Cache storage), higher-density deployment, and more flexible elastic scaling.
GPU server complete unit manufacturing remains led by Inspur Information (000977), Dawning Information (603019), and Baode Computing; but with domestic GPU scaling up, AI servers equipped with Huawei Ascend 910B and Cambricon MLU370 are rapidly increasing their share in government and central enterprise procurement, with combined market share exceeding 30% in 2025. Liquid-cooled servers have become standard in super-large data centers with power density exceeding 40kW/rack, driving rapid growth for liquid cooling equipment and cold plate liquid cooling suppliers.
At the interconnect level, 400G–800G optical modules are the blood vessels of AI compute clusters. Data center cabinets, precision air conditioning, UPS power supplies, and power distribution cabinets together form the physical infrastructure of data centers.
5.1.2 Historic Shift in Compute Supply-Demand Structure: From Training to Inference
The most important structural change in the compute market in 2025 is the significant migration of demand center of gravity from the training side to the inference side. The training-to-inference compute consumption ratio shifted from approximately 64/36 in 2023 to approximately 32/68 in 2025, and is projected to evolve to 27/73 in 2026.
For domestic chip manufacturers, the technical requirements on the inference side are relatively lower than the training side — Huawei Ascend 910B's inference performance has basically reached parity with or slightly exceeds NVIDIA H20 (in some workloads), providing a historic window for domestic compute to break through in commercial inference scenarios.
5.2 Foundational Model Layer: The Open-Source vs. Closed-Source Game
The most important structural change in the foundational model layer in 2025: open-source large model capability narrowed to "practical equivalence" with closed-source. DeepSeek-V3 matches Claude Opus/GPT-4.5 on multiple benchmarks, while its MIT open-source license allows any organization to access it at zero cost; Alibaba Qwen3 open-sourced the full-size series from 0.6B to 235B, with global downloads exceeding 300 million times and derived models exceeding 100,000 — officially surpassing Llama as the world's number-one open-source model ecosystem.
5.2.1 Seven Core Advantages and Three Inherent Limitations of Open-Source Models
Cost freedom: Zero licensing fees + freely chosen compute; small and medium enterprises completely freed from specific vendor lock-in.
Data sovereignty: Private deployment means enterprise core business data need not leave its own security boundary.
Customization flexibility: Can be fully or parameter-efficiently fine-tuned for specific scenarios, languages, and domain data.
Community iteration: Open-source ecosystem aggregates contributions from global researchers.
Competitive benchmark: The existence of open-source models creates natural constraints on closed-source model pricing.
Local inference: Quantized open-source models can be deployed on consumer GPUs or even CPUs.
Overseas-friendly: International markets' acceptance of Chinese open-source models (Qwen3, DeepSeek) is far higher than for closed-source commercial models.
Three inherent limitations: first, deployment and maintenance require substantial technical capability; second, top reasoning capabilities remain mainly in closed-source models; third, ecosystem fragmentation leads to compatibility issues.
From a longer time horizon, open-source vs. closed-source competition actually reflects the value distribution game in the AI value chain. The current trend leans toward the former — the closed-source premium window for strongest capability is being compressed from "years" to "months," which is the fundamental reason for the industry chain's value center of gravity migrating toward the application layer.
5.3 MaaS and Development Platforms: From Tools to Ecosystems
MaaS platform competition is gradually shifting from "whose model capability is strongest" to "whose developer ecosystem is most complete." Alibaba Cloud Bailian provides 200+ pre-built models covering text, image, audio, and video modalities; Baidu Cloud Qianfan's large-model enterprise customers exceeded 80,000 in 2025 (growing more than 150% year-on-year); Huawei Cloud ModelArts supports full Ascend compute integration, with unique advantages in government and finance private deployment.
RAG (Retrieval-Augmented Generation) enterprise adoption rate reached 75% in 2025, becoming a standard component in ToB deployments, driving rapid growth of vector database and knowledge management middleware markets. LoRA/QLoRA fine-tuning tools' maturity allows small and medium enterprises to complete industry model adjustments on a single GPU in hours, greatly lowering the technical threshold for AI application customization.
5.4 Vertical Industry Models: Six Key Segments
Finance AI: Tonghuashun (300033), East Money (300059) embedding large models in quantitative research, investment advisory recommendations, and risk control reports; 4Paradigm (HK 6682) deploying in bank credit risk control and retail marketing.
Government AI: Trs Data (300229), iFlytek with first-mover advantages in government big data and intelligent customer service.
Healthcare AI: Baichuan Intelligence "all-in" on healthcare, focusing on Baixiaoying AI pediatrics, AI general medicine, and precision medicine; United Imaging Medical and Infervision continuing to deepen medical imaging AI.
Industrial AI: AInnovation (HK 2121) focused on industrial visual quality inspection and process optimization, serving Foxconn, Baowu, and other mega-scale manufacturing customers; Midea Group reducing costs by approximately 40% through internal Agent-ization of more than 5,000 employees — a landmark case of industrial AI at scale.
Education AI: iFlytek AI learning machine revenue doubled in 2025; Zuoyebang and Yuanfudao embedding large models in personalized homework correction and error analysis.
Legal AI: Yuandian Law, Lingxin Intelligence, and Yuzhaokj focusing on contract review, legal search, and judgment prediction.
5.4.1 Industrial AI: Complete Path from Quality Inspection to Process Optimization
Industrial AI applications divide into four layers of increasing complexity and value:
Layer 1: Visual quality inspection and defect detection. The most mature industrial AI application. AI visual inspection replaces manual visual inspection, reducing miss rate from approximately 0.3%–0.5% to approximately 0.01%–0.05%, while running 24/7.
Layer 2: Predictive maintenance and equipment health management. By embedding large models in equipment sensor data analysis chains, predicting equipment failures in advance reduces unplanned downtime by approximately 30%–50%.
Layer 3: Process parameter optimization and production scheduling. The highest-value, highest-technical-threshold scenario. In steel continuous casting, more than 300 parameters affect slab quality; AI systems provide parameter adjustment recommendations at millisecond-level speed, improving quality pass rate from approximately 88% to approximately 95%.
Layer 4: Full supply-chain AI optimization. Covers raw material procurement forecasting, inventory dynamic planning, and logistics scheduling, with AI Agents potentially reducing supply chain response time from "day-level" to "hour-level" or even "minute-level."
5.5 Application Layer: Dual-Track Progress of ToB and ToC
ToB applications' core pain point is the chasm from "POC to scaled deployment." The three key drivers crossing this chasm: security compliance (data stays on-premises, private deployment), quantifiable ROI, and standardized packaging (from custom projects to replicable products).
Three structural patterns often overlooked in ToB AI commercialization:
Pattern 1: Technical evaluation ≠ procurement decision. Many startup AI companies perform excellently in technical evaluation but lose the final procurement decision to Baidu, Huawei, and other major players. Core reasons: enterprises consider not just technology but also vendor risk, original manufacturer support, and ecosystem lock-in.
Pattern 2: "The second contract" is the true validation of the business model. A single ToB AI project delivery does not equal stable ARR; whether customers renew and expand after the initial project is the core indicator of business model health.
Pattern 3: Domestic ToB payment cycles are long. Government procurement payment cycles are typically 90–180 days after contract signing, creating real cash flow pressure on startups.
ToC applications' core question is "how to convert from free to paid." The three main paths: strong-binding scenarios (when AI tools deeply embed in users' core workflows, replacement costs are high), differentiated premium capabilities (free tier provides basic capabilities, paid tier provides significantly stronger capabilities), and overseas premium (leveraging stronger payment willingness in North America, Europe, and Japan to compensate for low conversion rates domestically).
5.6 Agent: From Tool to Autonomous Execution
Agent is the most venture-capital-favored direction of 2025–2026, with its essence being elevating large models from "answering questions" to "executing tasks." OpenAI Operator launched January 23, 2025; Claude Computer Use / Agent SDK launched in late 2024; Manus (Butterfly Effect) launched March 2025, accumulating 500,000 candidate users in one week, and was acquired by Meta for over $2 billion in May 2026.
Enterprise Agent deployment data is equally impressive: Google Cloud research shows 52% of generative AI-using enterprises have deployed Agents in production environments; Salesforce predicts 80% of enterprise applications will embed Agent capabilities before end-2026.
5.6.1 Technical Architecture Evolution of AI Agents: From Single Calls to Multi-Agent Collaboration
Phase 1 (2023–H1 2024): Reactive single agent. User inputs a task, the large model analyzes and calls predefined tool sets (web search, code execution, file operations), returns results after completion.
Phase 2 (H2 2024–2025): Planning single agent with memory. Introduces persistent memory mechanism and multi-step planning capabilities, dynamically adjusting subsequent plans after evaluating results at each step. Manus (March 2025) and OpenAI Operator (January 2025) are representative products.
Phase 3 (H2 2025–present): Multi-agent collaborative systems. Allocates complex tasks to multiple specialized sub-agents for parallel processing, with a master agent coordinating integration.
The Model Context Protocol (MCP) was widely adopted by mainstream LLM vendors in 2025, standardizing interaction interfaces between agents and external tools, APIs, and databases.
5.7 AI Devices: Hardware Is the Fastest Path to Commercialization
AI devices are currently one of the highest monetization-efficiency forms of large-model realization: one-time hardware purchase creates cash inflow; software subscriptions contribute ongoing ARR.
AI smartphones: IDC projects 2026 China market AI smartphone shipments at 147 million units (+31.6% YoY), representing 53% of the overall market; global AI smartphone shipments in 2026 are expected to exceed 600 million units.
AI PCs: Canalys projects 2025 AI PC shipments exceeding 100 million units, growing to 205 million units in 2028.
AI glasses: Meta Ray-Ban smart glasses broke through in H2 2025; IDC projects 2026 global smart glasses shipments exceeding 23.7 million units, with AI glasses potentially exceeding 30 million units — officially entering mass consumer product scale.
5.8 Data and Security: Underappreciated Infrastructure
The capability ceiling of large models depends on the quality and diversity of training data; the security floor of enterprise deployment depends on the maturity of data governance and model alignment. China's data element market entered an acceleration phase in 2025, with nationally listed datasets on data exchanges exceeding 500,000.
High-quality pretraining data: the new round of competition for scarce resources. Strategies to break through the "data wall": first, obtaining data flywheels from user interactions; second, synthetic data generation (using existing large models to generate more training data); third, exclusive procurement of proprietary industry data (medical imaging annotation, legal judgments, financial report interpretation, factory process parameters).
六 In-Depth Analysis of Key Companies
6.0 Tier Classification of Chinese AI Large-Model Companies (June 2026)
First Tier: Resource-Driven Super-Platforms (four companies) Baidu, Alibaba, ByteDance, Tencent — shared characteristics: proprietary massive training compute (self-built super-large GPU clusters), massive user data flywheels, strong engineer team density. Competitive strategy: "full-stack layout, ecosystem traction."
Second Tier: Technology-Efficiency Dark Horses (four companies) DeepSeek, Moonshot AI (Kimi), MiniMax, Zhipu AI — shared characteristics: achieving disproportionate technical breakthroughs under limited compute resources, forming global technical reputation. Competitive strategy: "technical differentiation + overseas monetization."
Third Tier: Vertical Industry Deep Cultivators (multiple companies) iFlytek (speech/education/government), Kingsoft Office (office software AI), SenseTime (visual AI/embodied intelligence), 4Paradigm (enterprise decision AI), Baichuan Intelligence (healthcare AI) — shared characteristics: deep scenario accumulation and customer relationships in specific industries.
6.1 Baidu ERNIE
Baidu is the earliest commercializer of AI large models in China, with ERNIE from 3.0 to 5.0 through seven years of continuous iteration. In 2025, ERNIE daily call volume exceeded 1.65 billion (33× increase from 50 million same period in 2023); ERNIE 5.0 Preview entered the global second tier in international evaluation benchmarks, ranking first domestically. Baidu Cloud AI public cloud market share held steady at 19.9% (China's first for five consecutive years).
Major strategic change in 2025 commercialization: ERNIE (ERNIE Bot) went fully free for PCs and App on April 1, 2025, greatly lowering user threshold, exchanging traffic scale for commercialization depth. At the same time, Baidu announced open-sourcing the ERNIE 4.5 series on June 30, 2026.
Baidu's deeper competitive picture: its technical accumulation depth in AI (PaddlePaddle deep learning framework, continuous ERNIE investment) gives it substantial first-mover advantages in large-model ecosystem infrastructure; but on the consumer side, Baidu faces fierce competition from ByteDance Doubao and Tencent Yuanbao. Baidu's core strategy is enterprise cloud services (ToB dominates) as the base, with Baidu Maps and Baidu Search traffic ecosystem as monetization handles, deeply cultivating autonomous driving (Apollo), marketing, and enterprise services.
6.2 Alibaba Qwen
Alibaba was one of the most active giants in global open-source ecosystem building in 2025. On April 29, 2025, Qwen3 officially open-sourced, releasing 8 models at once from 0.6B to 235B; by end-2025, Qwen open-sourced 200+ models, global downloads exceeded 300 million times, and derived models on Hugging Face exceeded 100,000 — officially surpassing Llama as the world's number-one open-source model ecosystem by downloads. In November 2025, the Tongyi App was renamed "Qianwen" and upgraded to version 5.0. Alibaba Cloud Q1 2026 revenue grew more than 20% year-on-year, with AI-related cloud services contributing the largest increment.
Alibaba's unique point in large-model strategy is the long-game "open-source for ecosystem" layout. Through open-sourcing top-capability models, Alibaba attracted hundreds of thousands of global developers to build vertical applications on Qwen, which in turn validates and expands Qwen's commercial application scenario landscape.
6.3 ByteDance Doubao and Volcano Engine
ByteDance's AI layout is characterized by parallel progress on "product side + infrastructure side." Product side: Doubao, Jimeng AI (image/video generation), and Doubao Aoxue three AI Apps had approximately 250 million combined MAU in Q4 2025, with Doubao holding nearly half of China's AIGC App MAU. Infrastructure side: Volcano Engine provides proprietary large models (Doubao series) + cloud compute + inference services, with ByteDance's daily Token processing volume exceeding 50 trillion — one of the world's largest inference compute consumers.
ByteDance Doubao's competitive strategy is worth separate analysis: relying on TikTok/Douyin's extremely strong user traffic base, touching the broadest users with Doubao's AI capabilities at the lowest marginal cost.
6.4 Tencent Hunyuan and Yuanbao
Tencent's full-year 2025 revenue was RMB 751.8 billion, with AI-driven profits reaching historic highs. Yuanbao (consumer-facing AI assistant App) MAU exceeded 100 million; AI workbench ima MAU exceeded 13 million. The unique advantage of Tencent's "social traffic AI monetization": it possesses assets no other company can replicate — WeChat's 1.2 billion MAU deep social relationship network.
6.4.1 Tencent AI Ecosystem's Unique Competitive Position
Tencent's AI doesn't need to separately acquire users — it only needs to embed AI capabilities into products users are already using daily. WeChat keyboard smart suggestions, Moments AI-assisted content creation, WeCom AI customer service, Tencent Docs AI writing assistants — each touchpoint achieves AI function penetration at zero marginal customer acquisition cost.
6.4.2 Horizontal Comparison of Four Platforms' AI Investment (FY2025)
- Baidu: Total AI investment approximately RMB 30 billion; AI-related revenue from Baidu Cloud approximately RMB 20 billion.
- Alibaba: FY2025 capital expenditure approximately RMB 116.6 billion (~$16.5 billion), mainly for cloud computing and AI infrastructure expansion.
- ByteDance: Estimated AI infrastructure investment not less than RMB 50 billion; total daily Token processing volume exceeds 50 trillion.
- Tencent: AI special investment (Hunyuan + Yuanbao) approximately RMB 18 billion; total capital expenditure exceeding RMB 150 billion.
Four companies combined AI investment (including compute) approximately RMB 300 billion — the most core driving force behind China's rapid AI industry infrastructure formation.
6.4.3 Key Inflection Points in Large-Model Competition in 2026
Node 1: Baidu ERNIE 4.5 series open-sourcing (June 30, 2026). Baidu announced open-sourcing the ERNIE 4.5 series on this date, meaning China's most experienced large-model company is formally joining open-source ecosystem competition.
Node 2: MiniMax Hong Kong IPO. Passing listing hearing at end-2025, expected to complete IPO in H1 2026 — becoming the first public market pricing anchor for Chinese AI large-model startup companies.
Node 3: Expected DeepSeek V4/R2 release. Based on High-Flyer's R&D pace, a release approximately 6–9 months after V3 (December 2024) is projected, meaning H2 2025 to early 2026.
Node 4: Agent scale deployment Q3 effect. Multiple market research institutions project that Q3 2026 (July–September) will see an "explosive inflection point" in enterprise Agent scale deployment.
6.5 DeepSeek: The Technical Disruptor Rewriting the Cost Curve
DeepSeek was incubated by quantitative private equity Wentian Science and Technology. On January 20, 2025, DeepSeek-R1 was open-sourced using the GRPO reinforcement learning method, matching OpenAI o1 on math, code, and reasoning tasks with total training cost of only approximately $5.6 million; the subsequently released DeepSeek-V3 (685 billion total parameters / 37 billion active parameters) surpassed Claude Sonnet/GPT-4o on multiple benchmarks. API pricing is only 1/10th of OpenAI, forcing the entire industry to follow with price cuts.
Three layers of historical significance of the DeepSeek phenomenon:
Layer 1: Proving the viability of the engineering efficiency route. Under constraints of NVIDIA H800 (subject to export controls) rather than H100, DeepSeek's team elevated training efficiency to an industry-stunning level through algorithmic innovations: MLA attention mechanism reducing KV Cache VRAM usage by 60%, FP8 mixed-precision training saving 40% VRAM, fine-grained Expert routing reducing invalid computations. NVIDIA's market cap shed approximately $589 billion in a single day after DeepSeek's release — the most direct expression of capital markets' repricing of the "compute myth."
Layer 2: Reshaping the global AI cost benchmark. DeepSeek-V3 API input pricing approximately $0.27/million tokens (output approximately $1.1), compared to GPT-4o's $5/$15 — only 1/10th to 1/15th the cost. This cost curve reshaping accelerated the explosion of the global AI application layer — when API costs are no longer the limiting factor, large numbers of previously non-commercializable scenarios become viable.
Layer 3: Creating a new path of Chinese AI "technology export" rather than "product export". Past Chinese tech overseas expansion primarily relied on consumer product forms like TikTok, e-commerce, and games. DeepSeek, through pure technical open-sourcing, had its technological achievements adopted by tens of millions of global developers worldwide without entering any consumer market or establishing any overseas institutions — a brand-new mode of technology influence dissemination.
6.6 Zhipu AI (GLM Z.AI)
Zhipu AI is the large-model startup with the deepest Tsinghua academic background. The GLM series continues iterating from GLM-4 to GLM-5. In 2025, Zhipu AI initiated the Hong Kong IPO process, expecting to raise approximately $300 million. On the commercialization side, the GLM coding plan's annualized revenue already exceeded RMB 100 million. By early 2026, GLM-5 ranked in the top five for global open-source model call volume on OpenRouter.
Three unique aspects of Zhipu AI's technical accumulation: deep roots in Tsinghua University's AI technical system; early focus on code generation capability (investing deeply in CodeGLM earlier than most competitors); and continuous optimization of bilingual and multilingual capabilities.
The most competitive differentiation: "enterprise private deployment + developer ecosystem" combination route.
6.7 Moonshot AI (Kimi)
Moonshot AI's core differentiation advantage is "long context + web reading." After facing the DeepSeek shock in February 2025, Kimi resolutely cut advertising spending and turned toward technology itself — investing in K2 trillion-parameter MoE large-model R&D.
Completed $500 million Series C at end-2025, at a valuation of $18 billion. In early 2026, Kimi released the K2.5 model targeting overseas ToB users — generating revenue in under 20 days exceeding all of FY2025 total revenue, becoming a landmark case in Chinese AI application overseas expansion.
Moonshot AI's three-step response strategy: Step 1, immediately cut large amounts of inefficient advertising; Step 2, reposition from "general AI assistant" to "deep long-document understanding assistant"; Step 3, open B-side API interfaces to overseas users.
K2.5's case proves: in the global market, Chinese AI models' cost advantage (approximately 1/5th to 1/10th of comparable US models) is a real competitive moat, with global developers and enterprise users having sufficient rational payment willingness.
6.8 MiniMax (Hailuo)
MiniMax is one of the most successful Chinese AI companies in overseas expansion: overseas revenue share of C-side social AI and Hailuo AI exceeds 70%. Passed Hong Kong Stock Exchange listing hearing at end-2025, expecting to raise $600–700 million at approximately $4 billion valuation.
MiniMax leads in audio/music generation, with Hailuo Video widely adopted by global developers. In technical approach, MiniMax adheres to self-developed super-large parameter MoE foundational models — one of the few startup large-model companies insisting on the "full-stack self-developed" route. In early 2026, MiniMax M2.5 ranked first in global model call volume on OpenRouter, surpassing all US models.
The core monetization source: C-side AI social products (Talkie, similar to Character.ai) subscription revenue in North America, Japan, and Southeast Asia markets. Talkie's monthly App Store revenue in H2 2025 consistently ranked in the top ten globally for AI apps, with subscription prices approximately $9.99–$14.99/month.
6.9 iFlytek (002230)
iFlytek is the representative "national team" enterprise in Chinese AI, with the combination of speech recognition + large models covering education, healthcare, and government — the three core government procurement scenarios. In Q1–Q3 2025 revenue was RMB 16.989 billion; AI hardware (learning machines) revenue doubled; overseas revenue broke through RMB 1 billion.
iFlytek's competitive moat built on three-layer structure: first, speech technology moat (20+ years of engineering experience, tens of billions of labeled speech data); second, government relationships and compliance capabilities (multi-year cooperation with core government departments); third, AI hardware terminal learning machine explosion (hardware + content + AI services triple revenue structure).
6.10 Kingsoft Office (688111)
Kingsoft Office is one of the most successful Chinese cases of AI empowering traditional software. Full-year 2025 revenue RMB 5.929 billion (+15.78% YoY); WPS AI MAU at end of June 2025 reached 29.51 million; WPS AI 3.0 launched. WPS 365 enterprise version revenue grew 62.27% YoY in H1 2025.
The core advantages of the "AI upgrading existing products" path: low migration cost for existing users (no need to learn new tools), short commercial conversion path (free-to-paid within familiar WPS interface), and low enterprise data integration cost.
6.11 SenseTime Rixin (HK 0020)
SenseTime's core focuses: visual AI, embodied intelligence, and AI infrastructure. Generative AI revenue growing rapidly in 2025. SenseTime's embodied intelligence layout (SensePAD robot platform, Lingxi mechanical arm series) is its unique strategic direction distinguishing it from pure-software large-model companies.
6.12 4Paradigm (HK 6682)
4Paradigm focused on enterprise AI platforms, with Sage AIOS having mature solutions for private deployment in banking, insurance, and retail. FY2025 full-year revenue approximately HKD 3.5 billion, growing approximately 20%; listed by Gartner in the Enterprise AI Platform Magic Quadrant. International strategy extending to the Middle East and Southeast Asia: signed MOU with Abu Dhabi Global Market in 2025.
6.13 Stepfun and Baichuan Intelligence: Two Startup Forces with Distinct Characteristics
Stepfun: Founded by former Anthropic research engineers and Google DeepMind researchers. Core: Step-2 series multimodal large models with outstanding image understanding and visual reasoning capabilities. Full-year 2025 revenue approximately RMB 500–800 million.
Baichuan Intelligence: Led by former Sogou CEO Wang Xiaochuan, strategic core is "all-in on healthcare" — focusing AI large-model capabilities on China's RMB 10 trillion healthcare market. Commercialization timing depends on regulatory approval (Class III medical device registration).
6.12.1 Domestic AI Chip Duo: Cambricon and Hygon's Differentiated Paths
Cambricon (MLU series): H1 2025 revenue RMB 2.881 billion, growing 4,347.82% YoY, achieving profitability (net profit RMB 1.038 billion) — explosive growth driven by concentrated release of government and central enterprise procurement.
Hygon Information (DCU series): Revenue approximately RMB 3 billion in FY2025; DCU (Deep Computing Unit) targets the Chinese x86-compatible market, designed for ROCM ecosystem compatibility, lowering migration threshold.
七 Regional Clusters and Overseas Expansion
China's AI large-model industry is not evenly distributed but forms several distinct geographic clusters, each with different resource advantages, policy orientations, and industry ecosystem characteristics.
Beijing: Talent Center and Policy Experimental Field
Beijing has the highest concentration of China's top AI talent: AI-related paper output accounts for more than 30% of the national total; major universities (Tsinghua, Peking University, Renmin University), research institutions (Chinese Academy of Sciences Computing Institute, Beijing Academy of Artificial Intelligence), and top enterprises (Baidu headquarters, ByteDance headquarters) form an unparalleled talent supply system. The "Zhongguancun AI Innovation Park" and "Tongzhou International AI Design Park" provide spatial carrier support for AI startups.
In terms of policy environment, Beijing is the most active region in national-level policy experimentation — algorithms, generated content labeling, and AI interaction service management regulations were all first piloted or first reported from Beijing. Zhongguancun National Independent Innovation Zone has special approval authority for AI enterprise projects, shortening enterprise project approval cycles by approximately 30%–40%.
Shanghai: Vertical AI and Financial Data
Shanghai's AI landscape is more finance and manufacturing verticals. The city's financial industry AI deployment depth ranks first nationally — major commercial banks (ICBC, CCB, CMB) have set up AI centers in Shanghai, and securities companies (CITIC Securities, Guotai Junan) have deployed large model-driven quantitative strategies and intelligent investment advisory systems. The National Financial Data Zone in Lingang Special Area provides a legally compliant data sandbox environment for AI financial applications — financial institution data can be used for AI model training within a "controlled circulation" framework, addressing the data compliance challenges that previously impeded AI for finance.
Hangzhou: Alibaba Ecosystem's AI Breeding Ground
In the Chinese AI ecosystem, Hangzhou holds a uniqueness that no other city has: within a 20-kilometer radius, there is concentrated Alibaba Cloud's three major compute super-centers, the Qwen model team, DingTalk AI product team, and one of the largest e-commerce data assets in the world. This geographic concentration is not coincidental — it evolved from Alibaba's massive headquarters effect, gradually forming a "Alibaba as sun + hundreds of startups as planets" ecosystem structure. Companies that have received funding or technical support from Alibaba Cloud include Lingji (AI workflow automation), Kuaizhi Technology (AI customer service), and Yunzhisheng (voice AI). The common characteristic of these companies is: they validate business models in the Alibaba ecosystem first, then expand to non-Alibaba customers.
Shenzhen: Manufacturing AI with International Reach
Shenzhen's AI industry is inseparably linked to manufacturing: Foxconn's Shenzhen factories (world's largest smartphone assembly base), BYD's new energy vehicles, and DJI's UAV manufacturing have all become the first batch of large-scale AI application scenarios. The manufacturing AI value in Shenzhen is structured on three layers: the shallowest layer is visual inspection (automated optical defect detection on production lines); the middle layer is equipment predictive maintenance (using sensor data to forecast equipment failure time); the deepest layer is supply chain coordination (dynamically adjusting procurement plans and production rhythms based on order backlog and raw material prices). The density of manufacturing scenarios provides a testing ground that most Chinese cities cannot provide — a single Shenzhen factory may produce hundreds of millions of product units per year, providing massive data for training and validating industrial AI models.
Overseas Expansion: Three Routes and the Regulatory Maze
中国AI出海在2025年形成了三条主要路线:第一条是"技术开源路线"(DeepSeek、通义千问),绕过了消费者市场的监管壁垒,以技术影响力渗透全球开发者生态;第二条是"产品直出路线"(MiniMax Talkie、TikTok),面向消费者端用AI社交/内容产品直接参与全球市场竞争;第三条是"B2B出海路线"(第四范式、Kimi K2.5),以企业级API或平台服务为主要形式,进入对政治敏感度较低的中东、东南亚和拉美市场。
Chinese AI overseas expansion formed three main routes in 2025: first, the "technology open-source route" (DeepSeek, Qwen), bypassing consumer market regulatory barriers and penetrating the global developer ecosystem through technological influence; second, the "direct product export route" (MiniMax Talkie, TikTok), directly competing in global markets with consumer-facing AI social/content products; third, the "B2B overseas route" (4Paradigm, Kimi K2.5), using enterprise-grade API or platform services as the main form, entering the Middle East, Southeast Asian, and Latin American markets with lower political sensitivity.
天下工厂平台(www.tianxiagongchang.com)覆盖的480万家在产工厂中,深圳、苏州、东莞等智能制造核心城市的工厂对AI质检和排产系统的采购意愿在2025年明显领先全国其他地区——AI硬件供应链的真实渗透节奏,在工厂采购决策层面已提前反映出来。
Among the 4.8 million operating factories covered by the Tianxia Gongchang platform (www.tianxiagongchang.com), factories in core smart manufacturing cities such as Shenzhen, Suzhou, and Dongguan show significantly higher willingness to procure AI quality inspection and production scheduling systems in 2025 compared to other regions nationally — the actual penetration pace of the AI hardware supply chain is already reflected ahead of time at the factory procurement decision level.
7.6.1 Regulatory Maze: Overseas Compliance Costs and Risk Analysis
The core regulatory compliance challenges for Chinese AI overseas expansion vary significantly by market:
US market: The highest compliance threshold, involving CFIUS national security review (potential investment blocking), APP data privacy regulations (CCPA in California), Section 5 unfair competition investigations by the FTC, and multiple layers of immigration policy impacts on talent flows. TikTok's legislative ban has become a vivid case study for the legal risks Chinese AI products face in the US market, making many Chinese AI companies proactively choose to avoid the US consumer market.
EU market: The AI Act's risk-level framework requires Chinese AI products to complete technical document verification and fundamental rights impact assessments for high-risk scenarios before entering the EU market; data cross-border transfer must comply with GDPR standard contractual clauses (SCCs) or obtain adequate country certification. The current dilemma for Chinese AI companies: training data may contain EU user data, and completing data flow legality certification requires significant legal investment.
Southeast Asian markets: Relatively the most policy-friendly direction. Singapore's PDPA data protection law is mature but moderate in enforcement intensity; Vietnam, Indonesia, Thailand, and other countries are still in the early stages of AI regulatory framework construction, with relatively high policy flexibility. The regulatory compliance costs for Chinese AI in Southeast Asia are approximately 1/5th of the EU market, and the market growth rate is higher than China's, making it the most cost-effective overseas expansion direction in the short term.
Middle East markets: The Middle East's AI governance framework is still in the formation stage, with UAE and Saudi Arabia actively attracting global AI investments through national AI strategies (UAE's "AI 2031," Saudi Arabia's "NEOM" project), showing high policy friendliness for incoming Chinese AI companies. The obstacle for Chinese AI in the Middle East is mainly brand recognition rather than regulation; localized content adaptation (Arabic language capability, Islamic cultural sensitivity) is the key investment direction.
八 Deep Dives into Key Sub-Topics
8.1 Foundational Models: Four Structural Changes in 2025
Change 1: Scale-intelligence law enters "flat zone". In 2022–2023, simply scaling up model parameters and training data reliably improved performance; since 2024–2025, this "scale law" began slowing down — the performance gap between models at the trillion-parameter level has narrowed. This means that the era of "bigger is better" is ending, and algorithmic innovation is returning as the primary variable.
Change 2: MoE becomes the mainstream architecture. Three benefits: maintaining the overall knowledge breadth of super-large parameters, lowering the computation cost of a single inference, and making capacity expansion more economical. DeepSeek-V3 (685 billion total parameters / 37 billion active parameters), GPT-4 (estimated 1 trillion+ parameters / MoE implementation), and Google's Gemini Ultra 3 all adopt MoE as the core architecture.
Change 3: The rise of reasoning models. The core difference of reasoning models versus standard large models: they generate detailed "chain-of-thought" reasoning steps through reinforcement learning before producing the final answer, rather than mapping directly from input to output. This allows models to invest more computation in solving complex problems and self-correct at intermediate steps.
Change 4: Native multimodality becomes the baseline. The dominant development direction in 2025 is training image, audio, and video tokens together with text tokens from the pretraining stage. Native multimodal models have significant advantages over "text model + visual patch" architectures in cross-modal reasoning tasks.
8.2 Reasoning Models: PRM and MCTS Deep Dive
The core technical elements of reasoning models include:
Process Reward Models (PRM): Train a separate "process evaluator" to score the correctness and logic of each intermediate reasoning step, rather than only providing feedback on the final answer. This "step-by-step feedback" signal allows models to learn fine-grained reasoning skills.
Monte Carlo Tree Search (MCTS) integration: The model can explore multiple different reasoning paths simultaneously, using the MCTS search tree to evaluate the quality of each path and select the globally optimal reasoning route. This "tree search + pruning" approach simulates human thinking modes in complex problem solving ("first consider approach A, if not, try approach B").
8.3 Agent Technology: Reliability Challenge and MCP Protocol Breakthrough
Enterprise-grade Agent's most critical challenge is reliability: Agent failure in a multi-step task chain is non-linear (one step's error gets amplified in subsequent steps). For enterprise-critical business processes, task failure rates must be controlled below 0.1% to be acceptable — at current technology levels, complex Agents' single task success rates may only be 70%–85%.
Key progress in the MCP (Model Context Protocol) standard in 2025: MCP standardized the API interfaces between AI Agents and external tools, databases, and services, similar to establishing a unified "USB interface standard" for AI Agents. More than 50 tool providers including Stripe, Cloudflare, and GitHub have released MCP plugins, rapidly expanding Agent's callable tool ecosystem.
8.4 Video Generation: Four Commercialization Paths
China's main video generation products include: ByteDance Jimeng AI, Kuaishou Kling, MiniMax Hailuo Video, Alibaba Tongyi Wanxiang, and others. Four commercial monetization paths:
Creator tools (targeting content creators, pay per generation or subscribe to memberships): Kling and Jimeng AI have achieved meaningful paid users in this direction.
Advertising material production (replacing expensive human video production): Per-video pricing, contract or SaaS mode.
Entertainment and games (generating game scene assets, NPC dialogue, dynamic event backgrounds): Per-asset or license mode.
Enterprise training and education (generating corporate training course videos, product introduction videos): SaaS subscription, bundled with enterprise training platform.
8.5 Finance AI, Healthcare AI, Overseas AI, Industrial AI, Education AI
Finance AI has reached "actual mass production" across three tier breakdown:
Tier 1 (quantitative research + risk modeling): Deepest AI penetration, most mature commercialization; 60% of head securities companies have used LLM-assisted multi-factor model development.
Tier 2 (intelligent customer service + intelligent investment advisory): Mass penetration phase, LLM-based customer service response quality reaches 85%–90% customer satisfaction.
Tier 3 (AI-driven credit decisions + anti-fraud systems): Most challenging compliance threshold; needs to comply with financial data protection regulations and central bank model risk management requirements.
Healthcare AI continues evolving through three phases:
- Phase 1: Imaging-based single-disease auxiliary diagnosis (mature, Chinese FDA approved 200+ AI medical devices);
- Phase 2: Electronic health record structured extraction + clinical decision support (rapid growth in 2025);
- Phase 3: Multi-modal AI physicians integrating imaging + clinical + genomics + drug data for personalized treatment plan recommendations (early commercial stage, Baichuan Intelligence "Baixiaoying" is a representative exploration).
九 Key Technology Deep Dives
9.1 MoE Architecture: Four Innovations from DeepSeek-V3
Innovation 1: Fine-grained expert segmentation. V3 increases the number of experts from traditional 8–64 to 256, while reducing the parameter count of each expert, allowing more precise matching between input content and expert specialization.
Innovation 2: Auxiliary loss-free expert load balancing. Replacing traditional auxiliary loss rebalancing with a "sequence auxiliary-free" load balancing strategy based on token offset, improving model convergence stability while eliminating the performance loss from auxiliary loss.
Innovation 3: FP8 mixed-precision training. Adopting FP8 (8-bit floating-point) precision for forward pass operations, reducing training VRAM by approximately 40% versus BF16, while maintaining model precision not significantly lower than full-precision training through gradient scaling and loss scaling techniques.
Innovation 4: Multi-head Latent Attention (MLA). Compressing the KV Cache required for each token's attention computation into a low-dimensional latent space, reducing inference VRAM usage by approximately 60% — a key enabler for deploying 685-billion-parameter models on limited hardware.
9.2 Test-Time Compute: Four Forms of Scaling
Test-Time Compute Scaling (TTCS) refers to dynamically adjusting the computation invested in the inference stage for different difficulty questions, improving output quality by investing more computation at inference time rather than increasing model parameters.
Four specific implementation forms:
Best-of-N (BoN): Generate N candidate answers simultaneously, select the highest-quality one using a Process Reward Model (PRM). Advantage: simple implementation, applicable to diverse answer scenarios (math, code, logic).
Sequential Refinement: Model generates initial answer, evaluates the quality of the current answer, then performs targeted refinement based on evaluation results, iterating multiple times. Applicable to complex writing, logical reasoning, and multi-step planning.
Beam Search: Maintain multiple "candidate reasoning paths" simultaneously, perform pruning and extension at each step, select the globally optimal complete answer. Suited for tasks with clear constraints and verifiable intermediate steps (formal proof, code compilation).
Verifier-Augmented Generation (VAG): Introduce an independent verifier model evaluating the accuracy of the main model's output at each intermediate step, guiding the main model to prioritize correct reasoning paths. Most suitable for high-verifiability domain tasks (mathematical proof steps, code unit test passing rate).
9.3 RAG and Fine-tuning Technology Maturity
Advanced RAG beyond basic chunking: Query rewriting (using LLM to transform ambiguous user queries into clearer retrieval queries), HyDE (Hypothetical Document Embeddings, having the model first generate an "ideal answer" then retrieving similar actual documents), graph-augmented retrieval (using knowledge graph relationships to expand retrieval context), and multi-path fusion (simultaneously launching keyword search, vector retrieval, and graph retrieval, using Reciprocal Rank Fusion to merge results).
LoRA/QLoRA parameter-efficient fine-tuning: Decomposing model weight increments into low-rank matrix products, fine-tuning only a small number of new parameters (approximately 0.01%–1% of total parameters). QLoRA further combines LoRA with 4-bit quantization, allowing 7B–70B parameter models to be fine-tuned on consumer-grade hardware (single RTX 4090).
9.4 Alignment Technology: RLHF and Constitutional AI
RLHF (Reinforcement Learning from Human Feedback) is the current mainstream large model alignment method: human evaluators score model outputs, training a reward model to simulate human preferences, then using reinforcement learning (PPO algorithm) to optimize the model to generate outputs consistent with human preferences.
Constitutional AI (CAI) is Anthropic's innovative alignment approach: giving the model a set of abstract value principles (constitution), having the model evaluate whether its own outputs comply with constitutional principles, and training using the AI's own self-evaluation results as reinforcement learning signals. CAI significantly reduces the amount of human annotation needed (the model can handle a large portion of self-evaluation itself), while making the alignment process more transparent.
9.5 Edge Models and Quantization Technology
Edge-side AI inference is rapidly maturing in 2025. Three major advances:
INT4/FP4 quantization mainstream: Mainstream edge quantization methods have entered the "FP4/W4A8" era (4-bit weight, 8-bit activation), achieving model compression ratios of 4–8× while maintaining model accuracy within 2%–5% of the original. Apple MLX framework supports Llama-3-70B running locally on M3 Max chips (memory bandwidth 400 GB/s).
AWQ (Activation-aware Weight Quantization): By analyzing activation value distribution, performing non-uniform quantization for weights with larger impact on outputs (higher precision for "important" weights), making 4-bit quantized models have near 6-bit or even 8-bit precision.
Speculative Decoding: The small draft model generates several candidate tokens first, and the large verification model verifies in parallel. The actual inference speed can reach 2–3× acceleration (essentially eliminating the sequential decoding bottleneck), while being completely equivalent to the original model in terms of output quality.
9.5.1 Edge-Cloud Collaborative Inference: New Paradigm for Distributed AI Systems
True edge-cloud collaborative systems need to make real-time decisions across multiple dimensions:
Decision Dimension 1: Local capability assessment. The system first evaluates whether the current request falls within the local model's capability range — simple reminders, local knowledge queries, and basic speech recognition execute directly on-device, ensuring <100ms response speed.
Decision Dimension 2: Privacy level assessment. User medical health data, financial information, and private conversations should be strictly processed on-device. Apple Intelligence's Private Cloud Compute: when requests require stronger cloud compute, data is encrypted before sending; the cloud server immediately deletes after processing; the entire process is invisible even to Apple itself.
Decision Dimension 3: Network condition adaptation. When network connection is unstable, the system automatically switches to on-device mode, maintaining basic availability under constrained capabilities rather than showing error pages.
十 Risk Analysis
10.1 US Sanctions and Domestic Compute Ceiling
US compute restrictions on China are transmitted through three main paths: first, restricting chip manufacturing equipment exports (ASML EUV lithography machine restrictions limit wafer fabrication in China); second, restricting advanced GPU chip exports (H100/A800 already restricted, H20 interrupted in April 2025); third, restricting advanced packaging technology (HBM memory stack packaging restricted).
China's countermeasures form three response dimensions:
Technological self-reliance: Cambricon's MLU570, Huawei Ascend 910C, Hygon DCU3000 represent the highest level of current domestic AI chips; training capabilities are estimated at approximately 60%–70% of equivalent NVIDIA H100.
Engineering optimization: DeepSeek demonstrated that under the same hardware conditions, algorithmic innovations can compensate for a portion of hardware disadvantages.
Application layer circumvention: Training super-large-scale foundational models requires H100-level training compute (current domestic chips are insufficient); but inference is less demanding, and domestic chips have basically achieved commercial-grade equivalence.
The most core risk: the next round of US sanctions may target chip design software (EDA tools) and advanced interconnect network equipment (InfiniBand-grade switches), potentially disrupting China's AI training infrastructure more fundamentally.
10.2 Commercialization Closed Loop and ARR Bottleneck
China's AI large-model commercialization has reached a structural inflection point: technical capability has been demonstrated; the gap between market size and valuation is the real bottleneck.
Structural reasons why ARR does not match valuations:
Training-inference cost asymmetry: Training a top-level large model costs hundreds of millions of dollars, but inference generates far less per-call revenue. To recoup training costs through API billing, tens of billions of API calls are needed — meaning hundreds of millions of daily active users using it regularly.
Chinese ToC's low payment willingness: Average user's willingness to pay for AI is approximately RMB 20–50/month — far below Netflix's approximately RMB 15/month in content value perception. The fundamental reason: content is emotionally irreplaceable (specific movies and TV shows can only be watched from the content platform), while AI assistants are widely substitutable (multiple free alternatives).
ToB's long decision cycle: Enterprise procurement decisions typically take 6–18 months, with trials → POC → pilot → scaled deployment requiring a long cycle.
10.3 Model Homogenization and Technical Moat
The model layer's competitive moat is being rapidly eroded by open-source:
DeepSeek's Qwen3, Llama 4 as the homogenization engine: When open-source models' performance reaches closed-source standards, enterprises have no reason to pay closed-source API premiums. This creates a "race to zero" dynamic for pure API model services.
Real moats not in the model itself: User data (real-world user behavior as RLHF signals), ecosystem integration (deeply embedded in enterprise workflows that are costly to migrate), and scenario proprietary data (industrial quality inspection data, legal document corpora) — these are what truly constitute moats.
10.4 AI Hallucinations and Copyright Risks
AI hallucinations and copyright infringement are the most frequently appearing legal and technical risks in actual deployment. In 2025, at least 6 major copyright infringement lawsuits were initiated by global media groups against AI companies, including Associated Press against OpenAI (reached settlement), Music Alliance against major platforms, and Getty Images against Google Gemini.
10.5 Compute Costs and Energy Constraints
The energy consumption of AI training cannot be ignored. A single H100 GPU cluster of 10,000 cards consumes approximately 35 MW of power — equivalent to the daily electricity consumption of 35,000 Chinese households. Global data center power consumption exceeded 500 TWh in 2025, of which AI-related power consumption accounted for approximately 40%.
China's energy policy forms a dual constraint: new data center applications must meet Green Electricity Certificate requirements (renewable energy proportion ≥ 30%); AI big box data centers (single project compute capacity >500 PFLOPS) require provincial energy review approval. The data center energy efficiency requirements push the actual proportion of green electricity in data centers higher.
10.6 The Open vs. Closed Source Game: Fundamental Business Model Divergence
Closed-source camp logic (OpenAI, Anthropic): Only paying users get the strongest capabilities, forming sustainable subscription revenue; model weights not disclosed, competitors cannot directly copy capabilities.
Open-source camp logic (Meta Llama, Alibaba Qwen, DeepSeek): Open-sourcing strongest weights in exchange for ecosystem scale — global millions of developers building applications on open-source models.
2025 market performance partially validated the coexistence possibility: OpenAI created approximately $4.2 billion annualized ARR globally with highly closed-source GPT-5; simultaneously Llama 4 rapidly became the standard reference after open-sourcing; DeepSeek established global reputation through open-source while quickly gaining share in commercial users with its paid API service.
China's policy environment is more friendly to the open-source model: the registration system only requires passing safety assessment, not mandating models be closed-source; and China's vast government and state-enterprise procurement market has a natural preference for "privately deployable" open-source models.
十一 Predictions for 2026–2030: Three Main Lines and Core Variables
11.1 Foundational Model Consolidation: From "Hundred-Model Wars" to "Top-Ten Structure"
In early 2025, China had 700+ large-model registrations; 2026 is projected to enter a consolidation period, with companies truly capable of sustained investment narrowing to 10–15. Three consolidation drivers: first, training super-large models requires per-instance compute investment of tens of millions to hundreds of millions of dollars; second, open-source large models' capability is sufficient for most vertical scenarios; third, application layer success increasingly depends on scenario depth rather than model generality.
Projected 2030 structure: "4 super (Baidu/Alibaba/ByteDance/Tencent) + 3 strong (DeepSeek/Zhipu/Moonshot) + N verticals." Total market scale in the medium scope is expected to reach RMB 500 billion+.
2026 mid-year strategic fork: Some mid-tier large-model companies will face next-round financing difficulties. 2027 commercialization validation checkpoint: Only 5–7 Chinese large-model companies may pass this checkpoint. 2028–2030 stable period: Stable professional division of labor — top 3–4 providing general foundational models + cloud services; middle 3–5 focused on vertical industry deep models.
11.2 Agent Explosion: The Decisive Track for Enterprise AI in 2026
Three main business models for Agent commercialization:
Workflow-as-a-Service: Selling automated workflow execution capability based on task volume, similar to RPA (Robotic Process Automation) market logic. Value proposition: replacing repetitive human operations, measured by work hours saved.
Intelligence Subscription: Enterprise subscribing to an "AI expert team," monthly fees in the range of tens of thousands to hundreds of thousands per enterprise. Value proposition: replacing consulting and analysis work.
Outcome Sharing: Taking a percentage of measurable business outcomes generated by Agent (e.g., AI credit risk control Agent sharing in saved bad debt write-offs). Value proposition: completely aligning with customer ROI, zero upfront cost.
11.3 AI Devices: Supply Chain Opportunity Map
AI devices are entering the cycle of fastest hardware replacement in history.
| Device Category | 2026 Global Shipment Estimate | AI Feature Penetration | Key Supply Chain Opportunities |
|---|---|---|---|
| AI smartphones | ~600 million units | ~40% (flagships) | NPU chips, on-device models, thermal materials |
| AI PCs | ~100 million units | ~50% | NPU modules, local inference middleware |
| AI glasses | ~30 million units | ~100% | Ultra-small cameras, bone conduction speakers, low-power SoC |
| AI earphones | ~200 million units | ~20% | Voice wake-up chips, local speech models |
The most certain supply chain opportunities: NPU chip design and packaging (boosted by domestic substitute demand), thermal materials for mobile devices (thinner, more efficient thermal management), and edge inference software middleware (supporting device-side model scheduling).
11.4 Inference Era: Three Competitive Dimensions
The "inference era" is when compute consumption shifts from training to inference, causing fundamental changes in the competitive landscape:
Competitive Dimension 1: Inference cost per token. In the training era, ability to train models was sufficient to win; in the inference era, the cost of each "conversation" will become the core metric. When inference cost is close to zero, the competitive focus shifts to application experience and ecosystem.
Competitive Dimension 2: Inference latency. First-token latency (TTFT) and output speed (tokens per second) directly determine user experience — real-time dialogue, voice interaction, and Agent task execution all have strict latency requirements.
Competitive Dimension 3: Inference efficiency under memory constraints. KV Cache management and batching strategies under limited on-device memory become the core competitiveness of edge inference.
11.5 National GPU Penetration: Three Paths
Three possible paths for domestic GPU to replace NVIDIA in China's AI market:
Path 1: Policy-mandated substitution (government and state-enterprise). Policy clearly requiring ≥X% domestic chip use ratio in new government AI projects; this market has certainty but is relatively small (approximately 20%–25% of China's total AI compute market). This path is already advancing in 2025.
Path 2: Application-specific optimization advantage. Domestic chips may outperform NVIDIA in specific scenarios (such as inference-only Huawei Ascend 910C on specific workloads) or cost per performance; when such advantages exist, commercial users rationally choose domestic chips.
Path 3: Full-stack software ecology maturity. When Ascend's CANN and Cambricon's MagicMind tool chains can support the mainstream AI frameworks and model libraries without extra work, the technical cost for enterprises to "switch to domestic chips" drops to near zero.
11.6 Five Key Variables
Variable 1: US chip restriction intensity. The next round of restrictions (possibly targeting InfiniBand-grade interconnects or HBM-stack-grade high-bandwidth memory) could trigger forced acceleration of domestic substitution.
Variable 2: Agent reliability breakthrough. If multi-step Agent complex task success rates in enterprise environments can be improved from current 70%–85% to above 95%, enterprise-level Agent commercial deployment will be unlocked.
Variable 3: Model capability gap convergence speed. If open-source models' capability continues to converge with the closed-source frontier within 3–6 months, the strategic moat of all closed-source model companies will rapidly shrink.
Variable 4: AI hardware device penetration S-curve position. If the AI smartphone penetration rate in China exceeds 60% in 2026 (current forecast 53%), it means AI hardware has entered the "late majority" adoption phase, and the growth curve will significantly accelerate.
Variable 5: Multimodal AI + embodied AI convergence speed. If multimodal models can be effectively integrated with physical robotic systems (embodied intelligence), the AI application scenario space will expand from virtual information space to physical manufacturing space.
11.7 Industry Map: Key Sub-Segments Worth Tracking in 2026
Segment 1: Inference Cloud Services (Inference-as-a-Service): Representative companies: Volcano Engine (ByteDance), Alibaba Cloud Bailian, Zhipu Open Platform, Silicon Cloud.
Segment 2: Vertical Industry Agent Platforms: Representative companies: 4Paradigm (finance), Baichuan Healthcare (healthcare), Xinye Technology AI (consumer finance), Haizhi Technology (government).
Segment 3: Enterprise Private Deployment: Combining model capability + private compute, serving government, finance, and healthcare data-sensitive customers.
Segment 4: AI Writing and Content Tools (CreatorAI): ByteDance Doubao Creation, Mita AI Writer, Tongyi Lingji.
Segment 5: AI Operating System and Software Infrastructure: Alibaba, Tencent, Huawei's AI-native OS strategies, plus developer frameworks like LangChain and LlamaIndex.
Segment 6: Domestic AI Chips (Inference Side): Cambricon (688256), Hygon Information (688041), Suiyuan Technology (unlisted).
Segment 7: AI Hardware (Consumer Terminals): AI glasses, AI learning machines, AI smartphones.
Segment 8: AI Data Services: Data annotation leaders (Haitianruisheng, Datatang) transforming from "labor-intensive outsourcing" to "AI training strategic asset suppliers."
Segment 9: AI Compliance and Security: Qi-Anxin, Anheng Information, NSFOCUS targeting AI security detection as new growth curves.
Comprehensive view: the largest structural opportunities often appear at cross-segment intersections. After 2026, boundaries between segments will become blurrier than today, and composite ecosystem capability will become the core differentiator for long-term winners.
十二 Conclusion: The Year of Application Explosion Has Arrived, Supply Chain Remains the Real Battlefield
2025 was the genuine "year of application" for China's AI large models: the commercialization inflection point has been confirmed, but the path to scaled ARR is still being explored. China's AI industry completed three key leaps this year: technically, the engineering efficiency camp represented by DeepSeek proved the possibility of "low-cost high-performance"; commercially, the overseas expansion track (MiniMax/Kimi) validated Chinese AI models' global competitiveness; and in policy, the registration system and safety standards built market entry barriers favorable to compliant operators. Standing at the mid-2026 vantage point looking back, what is most worth remembering is not which company won the evaluation benchmark rankings, but the structural transformation itself of the entire industry chain moving from "experimental period" to "realization period" — this transformation, and the opportunities and challenges it brings, will profoundly shape the competitive landscape of China's technology industry over the next five years.
12.1 Three Structural Conclusions
Conclusion 1: "Applications are the battlefield, models are merely the entry ticket". In 2023 everyone rushed to release large models; in 2024 benchmark scores became the battlefield; from 2025 onward the market logic has switched — when mainstream model capabilities converge, whoever can deeply embed AI capabilities into users' core workflows, build persistent data flywheels, and obtain the highest-quality real user feedback at the lowest cost, will be the ultimate commercial winner. This switch favors large tech platforms (with vast traffic bases) far more than pure model startup companies.
Conclusion 2: China's AI global competitiveness has moved from "technology following" to "local leadership". DeepSeek-R1's global technical shock, Qwen3 ranking first globally in the open-source ecosystem by downloads, MiniMax M2.5 ranking first in OpenRouter by call volume — these three things together prove: Chinese AI is no longer a "follower" of US AI. In specific dimensions (engineering efficiency, open-source ecosystem influence, price competitiveness) it has entered leading positions. This judgment contrasts with the outside view sometimes seeing "Chinese AI constrained by GPU limitations" — compute constraints exist, but Chinese engineers have transformed "constraints" into drivers of "efficiency innovation."
Conclusion 3: Supply chain is the most underappreciated foundation of the AI industry. No matter how advanced the algorithm of a large model, it requires real physical infrastructure support — manufacturing GPU servers, producing optical modules, cooling liquid-cooled servers, packaging AI chips — all depend on China's manufacturing industry's real capacity. The AI revolution is not a purely software revolution; it is simultaneously a manufacturing revolution; and the role China's manufacturing industry plays in this revolution is far more core than what most AI industry reports reveal.
12.2 Core Recommendations for Different Audiences
For investors: Focus on application-layer companies with genuine ARR growth (coding tools, AI office, vertical industry Agents) and supply chain sub-tracks with high supply-demand certainty (liquid cooling equipment, optical modules, high-end thermal management); maintain caution on foundational model startups (the gap between valuations and ARR requires at least 3–5 years to bridge); track commercialization milestones on the overseas track (MiniMax's HK IPO, Kimi K2.5 quarterly revenue trend are key signals).
For enterprise decision-makers: AI has moved from "pilot" to "must-do" — by 2025 more than 50% of above-scale industrial enterprises were using at least one AI auxiliary tool, with lagging enterprises facing significant cost efficiency disadvantages. Recommended entry priority: code assistance (immediate ROI) → document processing automation → customer service robots → production scheduling Agents.
For AI entrepreneurs: After 2025, there is no longer a window to "rebuild another general large model." Valuable directions concentrate in: vertical industry models with unique proprietary data (medical imaging, factory processes, judicial documents); Agent platforms deeply embedded in specific workflows (go deep in one industry rather than pursuing generality); AI product localization in overseas markets where US and European companies have not yet deeply covered (Southeast Asia, Middle East, Africa).
12.3 Methodology Limitations and Forward-Looking Risk Disclosure
AI industry prediction reports must disclose a fundamental cognitive limitation before numbers and conclusions: the AI large-model industry is one of the fastest-iterating industries in human history, with deviation rates between the reality of June 2026 and forecasts from 12 months ago often exceeding 50%.
Specific methodology limitations:
Revenue data reliability issues: The vast majority of Chinese AI large-model companies are unlisted (DeepSeek, Kimi, MiniMax, Baichuan, etc.), with revenue data entirely from media disclosures, financing announcements, and analyst estimates that cannot be verified through financial reports.
Benchmark reliability issues: Model capability benchmarks (MMLU, HumanEval, MATH, etc.) have "benchmark gaming" problems — some companies intentionally include training data highly similar to evaluation sets.
Unpredictability of geopolitical risks: The direction of US-China tech competition (chip ban escalation/relaxation, market access restriction changes, overseas policy adjustments) cannot be modeled quantitatively.
Uncertainty of Agent commercialization: As of June 2026, the proportion of enterprise-grade Agent projects that have been successfully delivered at scale (by contract value) remains low. "POC success but scale-up failure" is the most common current Agent project outcome.
Understanding these limitations does not mean maintaining pessimism toward the AI industry — quite the contrary. In the AI era, learning speed itself is a kind of moat. Grasping uncertainty and maintaining a clear sense of direction in dynamic conditions is the core capability every decision-maker needs to continuously cultivate — and the ultimate competitive advantage for navigating industry cycles and building long-term competitiveness in the AI wave.
数据来源
This report's main data sources and references are as follows (arranged by chapter order):
- OpenAI official announcements (GPT-5 release, August 2025); Sacra estimates (revenue data)
- Anthropic official news (financing announcements, September 2025 – June 2026); Crunchbase financing data
- Google DeepMind official blog (Gemini 3/3.5 release announcements, November 2025 – 2026); Wikipedia Gemini 3 entry
- Meta AI official blog (Llama 4 release, April 2025); TechCrunch Llama tracking reports
- NVIDIA SEC 8-K filings (FY2025 annual report, February 2025); TrendForce Blackwell research report
- xAI official announcements; Sacra xAI research report; Crunchbase financing records
- Mistral AI official website; EU AI Act official text
- China Cyberspace Administration (CAC) official announcements — Generative AI Service Registration List (Q1–Q4 2025)
- 36Kr Research Institute "2025 China Large-Model Industry Development Research Report"
- CIR "2024–2025 China AI Large-Model Development Platform Market Forecast Research Report"
- iMedia Research "2024–2025 China AI Large-Model Market Status and Development Trend Research Report"
- IDC China Intelligent Terminal Market Top-Ten Insights (2026); Canalys AI PC Forecast (2025–2028)
- Baidu, Alibaba, Tencent, iFlytek, Kingsoft Office, 4Paradigm HK/A-share listed company official financial reports (FY2025 H1, Q3 reports)
- Pengpai News "RMB 107 Billion, 930 Companies: The Wild Consensus of China AI Applications in 2025" (January 2026)
- Every Economic Newspaper (ByteDance Doubao 2025 review); 21st Century Business Herald (Cambricon, Hygon revenue data)
- TMTPost (Moonshot AI valuation and financing tracking); Shanghai Securities Journal (domestic GPU market structure)
- Hugging Face open-source model download rankings (July 2025 data)
- OpenRouter global model call statistics (January–February 2026 data)
- Google Cloud "ROI of AI 2025 Report"; Salesforce Agent prediction research
- Tianxia Gongchang platform (www.tianxiagongchang.com) factory database, covering 4.8 million operating factories, independently validating the real penetration distribution of AI hardware supply chain in China's manufacturing industry