This document frames China’s “virtual person” sector as a commercial field accelerated by metaverse-related attention and by practical adoption across industries. It treats virtual people as designed digital identities that can be tailored to different purposes, then scaled through content production and platform distribution. It also signals that business models are still being tested, even as use cases broaden beyond entertainment into media, finance, e-commerce, customer service, education, and advertising.
Market size thesis: The report anchors its argument in a two-layer market sizing approach, separating a broad “virtual person-driven industry” from a narrower “core market.” It uses the 2021 baseline and 2025 projections to assert rapid expansion, and it uses those figures as the justification for why downstream industries are worth tracking. The implied message is that the sector’s growth is being pulled by demand for scalable content, brand communication, and standardized service interfaces.
Classification and characteristics table: The classification section distinguishes virtual humans by role orientation and by the way they are designed and operated, separating entertainment-facing personas from enterprise or service-facing roles. It emphasizes that “what the character is for” determines how identity setting, content format, interaction depth, and operational cadence are built. It also implies that the technical stack is modular, with modeling, animation, speech, and AI-enabled interaction combined in different ways depending on whether the goal is performance, marketing, or service delivery.
Chart on education–culture–entertainment consumption: The consumption chart is used as a macro driver signal rather than a direct measurement of virtual human revenue. By showing rising per-capita spending in education, culture, and entertainment, the report argues there is expanding budget and attention for new digital content forms. The link to virtual humans is that they can produce content at scale, maintain consistent branding, and adapt quickly to platform trends, which makes them a plausible beneficiary of broader consumption growth.
Market scale trend chart through 2025: The market trend figure visualizes the growth narrative by plotting an accelerating curve from earlier years into the mid-2020s. It functions as a persuasion device that the sector is transitioning from experimentation to broader commercialization, with adoption spreading across more scenarios and more organizations. The chart’s role is to reinforce that the opportunity is not limited to a handful of celebrity virtual idols but includes operational deployments such as presenters and service-facing digital staff.
Industry chain diagram: The industry chain diagram structures the ecosystem into upstream enabling capabilities, midstream production and platformization, and downstream scenario deployment. Upstream elements correspond to the technologies and tools that make a virtual human viable; midstream corresponds to building, operating, and distributing characters; downstream corresponds to where value is captured in specific industries. The diagram’s core claim is that commercialization depends on coordination across the chain, including character operation, content workflows, and channel integration, not only on one breakthrough technology.
Livestream talent gap chart: The talent gap chart uses a projected shortfall in livestreaming personnel as a problem statement that virtual anchors can partially address. The underlying logic is that livestreaming requires sustained labor, consistent performance, and long operating hours, and that supply constraints create cost and quality volatility. Virtual anchors are presented as a way to stabilize operations by reducing dependency on scarce human hosts for repetitive or high-frequency formats.
Livestream anchor demand chart: The demand chart complements the talent gap narrative by arguing that the need for anchors is structurally rising with the livestream economy’s scale. It positions virtual anchors as an additive capacity option, not merely a novelty, because they can operate continuously and be replicated across brands and channels. Together with the talent gap chart, it frames virtual anchors as a practical response to labor dynamics as well as a new content format.
COCO digital host system diagram: The COCO case is presented as a platform-style virtual host solution where character assets are combined with speech, control, and operational modules. The diagram’s function is to show that a virtual host is not just an animated face but an operational system, including production, scripting or content input, scheduling, and output to different broadcast or livestream environments. The message is that value comes from a repeatable pipeline that turns a character into a deployable “host” product.
COCO scenario tiles diagram: The COCO scenario panel uses multiple example tiles to show how one virtual host can be reused across program types such as studio-style presentation and commerce-oriented hosting. This is intended to demonstrate low marginal cost once the character and pipeline exist, because new shows can be spun up by changing scripts, layouts, and scenario packaging. It also implies that the same operational backend can support different client needs, which supports a service or platform business model.
Liu Yexi case panel: The Liu Yexi example is used to illustrate how a strongly designed persona and aesthetic positioning can drive mainstream attention and brand collaboration potential. The panel format emphasizes identity design, narrative packaging, and platform fit as key success factors, not only technical realism. Within the report’s logic, this case supports the argument that virtual humans can function as scalable media properties when character branding and distribution strategy align.
Virtual anchor user outlook chart: The user outlook graphic compresses audience sentiment into a distribution across positive, neutral-to-positive, and negative categories. It is used to argue that acceptance is sufficiently high to support continued commercialization, even if skepticism remains. The practical implication is that audience openness lowers adoption friction for virtual anchors in content and marketing contexts, provided execution quality is adequate.
Digital employee commercial application cases: The digital employee section reframes virtual humans as enterprise-facing role performers rather than entertainment IP. The cases emphasize standardized information delivery, consistent brand presentation, and persistent availability for front-office interactions such as guidance, explanation, and service reception. The implied commercial logic is cost reduction and service consistency, with the virtual employee acting as a controllable interface that can be deployed across channels without the variability of human staffing.
Digital employee user perception section: The user perception portion is included to show that digital employees are not only a supply-driven concept but also something users can recognize and evaluate. By presenting quantified perception, the report suggests that acceptance can be measured and improved through better realism, clearer scenario value, and smoother interaction. It also signals that market education matters, because people may respond differently depending on whether they interpret the digital employee as a helpful interface or as a replacement for human staff.
Digital employee familiarity/recognition chart: The familiarity chart indicates that awareness is uneven and that recognition is still developing rather than universal. The way the numbers are presented supports the interpretation that digital employees have entered public awareness but remain scenario-dependent, with some users encountering them frequently and others rarely. The report uses this to imply that adoption will likely grow through repeated exposure in high-frequency service environments, alongside improvements in usability and interaction quality.
Virtual image application case panel: The virtual image case is positioned as an earlier commercialization lineage that virtual humans can build upon, especially in entertainment and branding. It implicitly contrasts non-interactive or lightly interactive virtual images with newer “virtual person” forms that can incorporate richer interaction and operational control. The intended takeaway is that character-driven IP operations are a proven pathway, and that adding AI-enabled interaction expands monetization options and scenario breadth.
2022 H1 “Top 100” lists tables: The ranked tables function as a market landscape snapshot that catalogs prominent virtual humans and provides a reference set for competitive comparison. Their purpose is less about deep analysis and more about showing density, variety, and visibility across the ecosystem. In the report’s framing, the lists support the idea that the sector is already populated with many active entities, which implies competition for attention and the importance of differentiated positioning.
Technology iteration and scenario expansion analysis: The concluding analysis argues that technology improvements will increase expressiveness, realism, and interaction quality, which in turn supports broader commercialization. It also implies that sustainable growth depends on productization and operational workflows that generate measurable returns in specific scenarios rather than relying on novelty. The overall direction is that virtual humans are expected to migrate from primarily entertainment-led adoption to more diversified sectoral deployments as systems mature.
iiMedia monitoring and tool matrix diagram: The tool matrix diagram positions the report within iiMedia’s wider data and monitoring ecosystem by depicting multiple tools and data products around a central analysis concept. Its role is to signal methodological support for the report’s claims, suggesting that rankings, public opinion monitoring, and industry data services are integrated. Functionally, it is a credibility and capability diagram, showing how the organization claims to observe the market and generate ongoing indicators rather than producing a one-off narrative.
The diagram presents a three-stage industry chain for China’s virtual human sector, moving left to right from upstream (“上游”) to midstream (“中游”) to downstream (“下游”). The intended logic is a pipeline: upstream supplies inputs (creative IP and production tools), midstream integrates and productizes those inputs into virtual human capabilities and platforms, and downstream deploys virtual humans as services, performers, or endorsers in specific application contexts. The arrows between columns signal a dependency flow rather than a strict linear manufacturing chain, because many participants can operate across layers; however, the diagram’s purpose is to show where value is commonly created and where typical company types sit.
In the upstream column, the diagram separates “content production” (内容制作) from “tools” (工具类), implying that virtual humans rely on both creative assets and production pipelines. The content side lists examples that appear to be IP/content owners or content production organizations such as 阅文集团, suggesting that narrative universes, characters, and licensed properties are foundational inputs for virtual idols and story-driven digital personas. The presence of additional content brands (e.g., “次世文化” and “大禹”) signals that upstream is not limited to one medium; it includes studios and rights-holders able to originate characters, scripts, styles, and promotional content that later become embodied in a virtual human.
The tools side in upstream highlights general-purpose creation and productivity infrastructure rather than “virtual human-only” tooling. Autodesk and Microsoft function here as stand-ins for the broader digital content creation stack: modeling/animation suites, asset pipelines, collaboration tooling, and the enterprise software layer that supports production, deployment, and integration. The diagram’s placement of tools upstream signals a view that much of the virtual human capability is downstream of mainstream 3D/CG, real-time rendering, and enterprise software ecosystems, rather than being entirely proprietary to specialist virtual human vendors.
The midstream column is where the diagram concentrates the “industrial core,” dividing it into several supplier types that together make virtual humans operational. The first band is “vertical virtual human vendors” (垂直虚拟人厂商), represented by specialized firms such as Xmov and “虚拟智能 (HAIMHUMAN TECHNOLOGY).” Their implied role is end-to-end virtual human production and operation: character design, motion/face performance, voice/personality packaging, content operations, and delivery of turnkey digital-person solutions for clients. Labeling them “vertical” suggests they integrate multiple capabilities internally (or tightly orchestrate partners) to deliver a coherent virtual human product rather than only supplying a single component.
The next midstream band is “internet technology vendors” (互联网技术厂商), exemplified by 火山引擎 and 百度. In this framing, large internet/cloud platforms provide scalable infrastructure (compute, streaming, content distribution), developer platforms, and sometimes speech/NLP/vision services that virtual human systems rely on for real-time interaction, live broadcasting, and multi-channel deployment. Placing them midstream indicates they are not merely “tools” but operational enablers whose platforms become part of the product delivery layer.
The “AI vendors” (AI厂商) band sits alongside the internet vendors and implies a capability layer: speech recognition, speech synthesis, face/pose estimation, conversational intelligence, and decision/agent logic that can be embedded into virtual humans. The diagram includes 科大讯飞 and “宇视 (uniview),” suggesting that the AI layer is treated as modular supply—virtual human vendors or service integrators can source components (voice, perception, analytics, safety filtering) from established AI providers. The key analytical point here is that the diagram treats “being a virtual human” less as a single monolithic technology and more as a bundle of AI subsystems orchestrated into a character experience.
The remaining midstream bands—“CG vendors” (CG厂商) and “XR vendors” (XR厂商)—make explicit that visual fidelity and real-time embodiment are parallel pillars to AI. The inclusion of 原力 under CG implies professional-grade character art, rigging, animation, and cinematic rendering supply. The XR band includes “相芯科技,” which is commonly associated with real-time face/AR effect and avatar rendering pipelines; its placement here indicates the importance of real-time tracking, rendering, and interactive presentation layers that allow a virtual human to appear live, react to a user, and perform across devices and channels.
The downstream column translates the midstream capabilities into three application clusters that are defined by “service/identity type” rather than by technology. In “enterprise services (service-type)” (企业服务(服务型)), the diagram shows examples like “浦发银行-小浦” (a banking virtual assistant persona) and “屈臣氏-屈晨曦” (a retail/service persona). This cluster frames virtual humans as front-of-house digital staff: customer service, product guidance, onboarding, and brand communication. The emphasis is not that the virtual human is the product itself, but that it is a service interface that reduces friction for users and standardizes brand-consistent interaction at scale.
In “entertainment (service/identity type)” (文娱领域(服务/身份型)), the diagram lists performance-oriented virtual beings such as “乐正绫-虚拟歌姬” and “RiCH BOOM乐队.” This cluster highlights a different value mechanism: the virtual human is an IP-bearing performer whose output is music, live events, and fandom-driven content. Operationally, this downstream use case tends to pull harder on upstream IP/content and midstream CG/XR production quality, because audience acceptance depends heavily on aesthetics, performance continuity, and cross-media storytelling rather than on purely functional interaction.
In “brand endorsement (identity type)” (品牌代言(身份型)), the diagram includes “花西子” and “洛天依,” reflecting a marketing pattern where a virtual being is positioned as a spokesperson or ambassador. The term “identity-type” (身份型) is important: it implies that the virtual human’s primary deliverable is a stable, recognizable persona that can carry brand associations across campaigns. Compared with enterprise service virtual humans, endorsement-style figures typically require tighter control of tone, visual consistency, and reputational risk, because the persona becomes part of brand equity and cannot behave like a generic chatbot-like agent without undermining the intended identity.
A key structural insight of the diagram is that it treats the virtual human industry as a coordination problem across content, capability, and channel distribution. Upstream content and tooling enable creation; midstream firms assemble enabling technologies into deployable virtual human “stacks”; downstream turns those stacks into business outcomes (service efficiency, entertainment monetization, marketing conversion). The diagram also implies a market dynamic in which specialist “vertical virtual human vendors” compete partly on integration skill—how well they combine cloud platforms, AI subsystems, and real-time embodiment—while large platform vendors compete on providing the most indispensable infrastructure layer for others to build on.
The diagram’s limitations are also informative. It simplifies cross-layer roles (a large platform may own content, tools, AI, and distribution simultaneously), and it does not explicitly call out governance elements that often matter in real deployments, such as compliance, moderation, rights management, and operations teams for persona maintenance. It also treats “virtual human” as one category while downstream examples mix quite different forms—customer-service presenters, virtual idols, and brand mascots—which can have fundamentally different technical requirements and success metrics. As an industry map, though, it is useful because it makes visible that “virtual humans” are rarely a single technology purchase; they are assembled from a chain of suppliers whose contributions vary depending on whether the target is service delivery, entertainment production, or identity-driven branding.