This article is an overview piece tied to a paid course launched in late April 2022 on the Sanjie learning platform, presenting “Virtual Digital Human 3.0” as a structured introduction to virtual digital humans from definitions and enabling technologies through to commercialization and the broader “metaverse” narrative. It frames virtual digital humans as a practical early route for “metaverse” industrial adoption, arguing that major technology and platform companies are entering a “human-making” race because digital humans can function as a foundational interface layer for future digital ecosystems, connecting people-to-people, people-to-things, and thing-to-things interactions in immersive or digitally mediated environments.
It then proposes a definition that evolves from earlier “virtual idol” history (referencing early virtual idols emerging in the 1980s) toward a more comprehensive notion of a virtual human as a computer-generated, interactive human-like entity created by converging technologies. On the technical side, it describes virtual digital humans as being produced through a bundle of capabilities including computer graphics, speech synthesis, deep learning, brain-inspired approaches, biotechnology, and computational sciences, aiming to reproduce human appearance, behavior, and in some narratives even value-like “thought” features. On the media and service side, it treats virtual digital humans as a new kind of “media role” for semantic and accessible communication, positioned to take on information creation and transmission functions inside a metaverse-style ecology, and to act as a connector that helps “scenes, objects, economic systems, and civilizational systems” cohere into an integrated digital world.
From that framing, the article lists a set of global and China-based “core players” said to be competing: Meta, NVIDIA, Microsoft, Roblox, Epic Games, Tencent, ByteDance, Alibaba, NetEase, Baidu, iFLYTEK, Huawei, iQIYI, and Bilibili. It argues that “super giants” such as Meta and Tencent are pushing broad, full-stack layouts spanning software, hardware, use cases, and IP-building, while older “technical giants” like NVIDIA and Baidu emphasize foundational AI infrastructure, and content-heavy platforms such as Epic Games, NetEase, iQIYI, and Bilibili focus on expanding content forms and application scenarios that embed digital humans into everyday digital life.
The central analytical scaffold is an industry-chain map divided into upstream technology providers, midstream platforms and service providers, downstream application adopters, and an overlay of industry capital. Upstream is described as the soft-and-hard infrastructure layer, covering tools and systems for modeling, motion capture, and rendering, with the article characterizing the competitive landscape as foreign providers having deeper accumulated experience while China is accelerating. It gives examples of China-based infrastructure initiatives, including Tencent’s xFaceBuilder pipeline for producing digital faces across devices, Sogou offering avatar “clone” capabilities through an AI open platform, and ByteDance-associated Pico emphasizing hardware supply and XR solutions such as motion and facial capture. The upstream layer is presented as the lever that lowers production cost, shortens production cycles, and reduces adoption barriers, enabling a shift from narrow commercial deployments toward broader consumer reach.
Midstream is split into technical solution platforms and operational/management solution platforms, portrayed as the “accelerator” that turns upstream capability into repeatable products and deployable services. For platform-style technical providers, it names Tencent, NetEase Fuxi, Volcano Engine, Baidu, and SenseTime, which are depicted as offering externally consumable functions such as batch generation and speech synthesis and packaging industry solutions for domains like gaming, cultural tourism, and finance. Alongside them, it places full-stack or single-module specialists such as Mofa Technology, Xiangxin Technology, STEPVR, Zhongke Shenzhi, and Aihuashen, emphasizing that these firms focus deeply on one or more production links such as character creation, 3D rendering, rigging, and live-stream systems. In parallel, the operations side is described as covering IP incubation, visual design and production, scene production, post-production content, talent management, agency operations, e-commerce services, and later-stage digital asset management, delivering customized or scaled packages for business and consumer needs.
Downstream is framed as rapid penetration into multiple sectors, with the article explicitly naming games, film/TV, livestreaming, finance, education, healthcare, and entertainment as areas where virtual digital humans can “land.” It proposes a broad split between B2B and consumer applications: on the enterprise side, digital humans become “digital employees” or specialized role substitutes such as brand officers, hosts, tour guides, and psychological clinicians; on the consumer side, the dominant forms are virtual streamers and virtual idols functioning as influencer-like performers or “celebrity doubles.” As an emblematic case, it highlights Liu Yexi, described as a beauty expert persona with supernatural storytelling hooks, which reportedly gained massive followers on Douyin very quickly and then moved into monetization through appearances, endorsements, and planned commerce livestream collaborations, including an endorsement of Clarins and a scheduled commerce-oriented collaboration with Qi Wei. The team behind Liu Yexi, Chuangyi Technology, is used to illustrate how successful virtual digital human projects can attract both brands and investors.
Industry capital is presented as the force that accelerates chain completion, with the article asserting that China-headquartered VR startups rank very high globally by total investment and that major internet companies and institutional investors are committing significant capital to the space. It names Tencent, ByteDance, NetEase, and Xiaomi as corporate entrants, and investment institutions including Sequoia Capital, Morningside Venture Capital, CICC, and CCB International, plus listed companies such as Alpha Group. The overall message is that these capital flows, combined with platform and infrastructure buildout, are speeding up the formation of a recognizable China-based industry chain.
In its assessment of opportunities and challenges, the text argues that big-tech participation lowers entry barriers by supplying infrastructure, funding, and talent development, and by making digital human services easier for ordinary users to access in entertainment, finance, education, and e-commerce. At the same time, it warns that startups and later entrants may find their competitiveness constrained by the strategic layouts of dominant platforms, and that content creators and operators can become dependent on platform policies, with the industry lacking unified operational standards and facing high migration or cross-platform reuse costs. It characterizes the moment as a “benign cycle” of “make humans, use humans,” where technology-driven players and operations-driven players jointly reinforce adoption, but it closes by positioning the article as a foundation rather than a final verdict, implying that the deeper questions—how to judge whether the industry has truly “risen,” what “digital genes” and “digital immortality” mean in this context, and how human–human coexistence in a metaverse era might work—are intended to be addressed in the associated course materials rather than fully answered here.
The "Upstream (Industry Foundation)", 上游(产业基础), diagram presents the upstream (“industrial foundation”) layer of the virtual digital human industry chain. “Upstream” here is framed as the enabling base that sits before content production, applications, and commercialization: the hardware and software needed to build, animate, render, and run digital humans, plus the tool platforms and specialized device/system suppliers that make production feasible at scale. The overall structure implies that downstream firms (studios, brands, platform operators, and service integrators) depend on this upstream stack for both capability (what can be made) and cost (how efficiently it can be made).
On the left, the “software + hardware” block depicts a developer-platform foundation that combines core computing components and production software. The vertical labels indicate a typical pipeline: display devices such as head-mounted displays and glasses; sensors; chips; facial-expression and motion-capture tools; and modeling and rendering software. Read together, this column is describing the minimum “production and runtime” substrate required for digital humans: capture or author performance (face/body), process it on capable hardware (compute and chips), and then create and render the digital human in a production engine or rendering tool so it can be deployed to target devices. The inclusion of “developer platform” at the bottom suggests that upstream providers increasingly bundle these elements into integrated environments (SDKs, toolchains, and partner ecosystems) rather than selling isolated components.
In the center, the “platform tools” block emphasizes that virtual digital human development is inseparable from modeling and rendering tools, and it explicitly notes that domestic platform toolchains are not yet mature, with foreign tools still dominant. The logos reinforce this point by foregrounding widely used creation and real-time engines such as Unity, Autodesk Maya, and Epic Games. Alongside these are Chinese internet and platform players (Tencent, ByteDance, Sogou) presented as AI or platform-tool participants, implying a second category of tooling: not just DCC/engine software, but also AI capability layers and developer platforms (for example, speech, vision, or content generation services) that can be integrated into digital human systems. The message here is that upstream tool dominance shapes standards, workflows, talent training, and interoperability; if the core tools are foreign-controlled, domestic ecosystems may face constraints in cost, customization, and strategic autonomy.
On the right, “devices and systems” focuses on dedicated equipment and integrated systems used in production and operation. The text indicates that beyond chips and general-purpose AI sensors, digital human hardware includes items such as cameras, glasses, and gloves, and that production is inseparable from motion-capture and facial-capture equipment and systems. The logos in this panel illustrate the specialized vendor landscape: Pico (head-mounted display ecosystem), OptiTrack (motion capture), Shadow Creator (capture/production systems), Noitom (motion capture), and other suppliers such as STEP, FOFHEART, VirDYN, and VRTIX. The presence of multiple brands signals that upstream capability is not a single product but a system-of-systems: capture devices, tracking solutions, real-time processing, and integration software that must work together reliably. This also implies a key competitive variable for downstream adopters: the degree to which an upstream stack is modular and interoperable versus vertically integrated and vendor-locked.
Taken as a whole, the diagram is making three strategic points about the upstream foundation. First, virtual digital humans are constrained by a full-stack dependency chain: without sensors/capture, compute, and modeling/rendering tools, “digital human” offerings remain superficial or costly. Second, platform-tool maturity is treated as a bottleneck and a locus of industry power, because it determines production productivity and standardization across developers. Third, devices and systems are highlighted as essential industrial infrastructure rather than optional add-ons, implying that realistic, scalable digital humans require investment in specialized capture and deployment hardware, not just software talent.
This "Midstream (Industry Accelerator)", 中游 (产业加速器), diagram describes the “midstream (industry accelerator)” layer of the virtual digital human industry chain, positioned between upstream foundational tooling/hardware and downstream application scenarios. The central idea is that midstream firms turn upstream capabilities into deployable, repeatable services by packaging technology into products, workflows, and managed offerings. The diagram splits midstream into two parallel but connected tracks: a technology solutions and service-platform track on the left, and an operations solutions and service-platform track on the right, with arrows indicating a handoff or progression from technology provisioning toward ongoing operation and commercialization.
On the left, the “technology solutions and service platform” side defines the core technical service providers for virtual digital humans. The text groups these providers into two main types: platform-type technology suppliers and full-stack (or single-function) virtual digital human service vendors. Platform-type suppliers are depicted as large technology companies and engine/platform operators that provide reusable capabilities such as AI, cloud, engines, and developer infrastructure; examples shown include Tencent, NetEase Fuxi, Volcano Engine, Baidu, and SenseTime. In practical terms, this category supplies the “common primitives” that many digital human products rely on—identity and account systems, cloud compute, speech and vision modules, real-time rendering pipelines, and standardized APIs—so that downstream builders do not need to recreate foundational AI and infrastructure from scratch.
The second category on the left—full-stack or point-solution service vendors—represents companies that directly deliver digital human systems, either end-to-end or by specializing in a particular segment such as modeling, animation, capture integration, speech-driven interaction, or deployment packaging. The text names examples such as MagicLab Technology, Xiaobing, Zao Technology, STEPVR, and Zhongke ShenZhi. The logos underneath reinforce the breadth of this ecosystem by showing a mix of engine/animation companies, production-system providers, and cloud/AI service brands. The structural point is that midstream technology providers reduce the “integration burden” for adopters by turning fragmented upstream components into a coherent productized stack, with clearer cost structures, delivery timelines, and maintenance responsibilities.
On the right, the “operations solutions and service platform” side describes what happens after a digital human can technically be built: making it economically useful over time. The text lists a set of operational services that are closer to media production and brand/IP management than to core engineering. These services include IP incubation, character/image (persona) design and production, scenario/content creation, post-production content generation, IP agency/brokerage, IP operations, e-commerce services, and digital asset management. In other words, the right side assumes the digital human is a “managed asset” that needs continuous content pipelines, distribution, and business operations to generate returns. The logos in this panel suggest firms that behave like studios, MCNs, marketing agencies, platform operators, or specialized operators that can run a digital human as an ongoing program rather than a one-off build.
The arrows across the top are important: they imply that the midstream is not simply two separate markets, but a chain where technical enablement feeds operational capability. A digital human can be delivered as a technical demo without operations, but it becomes an “industry accelerator” when it is paired with operational systems that sustain content output, brand consistency, channel growth, monetization, and compliance. Read this way, the diagram is emphasizing that the competitive advantage in midstream is often not a single algorithm or model, but the ability to deliver a full lifecycle: building the digital human efficiently, deploying it reliably, and then operating it continuously with repeatable workflows for content, IP, and commercialization.
This diagram presents the downstream layer of the virtual digital human industry chain, titled “Downstream (Industry Applications)”, 下游(产业应用). It frames downstream as the demand and deployment side: the places where virtual digital humans are used, distributed, monetized, and operationally integrated into real organizations and consumer-facing platforms. The structure divides this layer into three linked components: application domains (left), representative adopters and platforms (right), and capital/investment participants (bottom), with an additional cross-cutting base layer labeled “Industrial Security and Other Services,” implying governance, compliance, safety, and auxiliary services that support the entire downstream ecosystem.
On the left, the diagram maps application scenarios along a continuum from “commercial use” (商用) to “consumer use” (民用), indicated by arrows moving from left to right. The vertical columns list major sectors where virtual digital humans can be deployed, covering traditional enterprise domains and consumer media domains in one unified taxonomy. The commercial-leaning sectors include finance, manufacturing, and media/communications, which typically use digital humans for customer service, marketing, training, and public-facing institutional communication. As the columns move toward consumer-facing contexts, the emphasis shifts toward education, healthcare, film/TV entertainment, culture and tourism, e-commerce, gaming, and a catch-all “other industries,” reflecting the idea that digital humans can act both as functional service interfaces and as content/IP assets. The inclusion of both “education” and “healthcare” alongside “entertainment” and “gaming” signals that the diagram treats digital humans as a general interaction and representation layer rather than a single entertainment niche.
Below the sector columns, the diagram explicitly highlights “application platforms: Bilibili, Douyin, Huya, etc.” (应用平台:B站、抖音、虎牙等), which clarifies how downstream adoption is operationalized in practice: not only through enterprise deployments, but also through distribution channels that already aggregate audiences, creators, and monetization tools. In this framing, platforms are not just marketing outlets; they are the primary runtime environments where digital humans appear, accumulate followers, conduct live sessions, and generate transaction value. This is important because it implies that downstream success depends on platform affordances—recommendation systems, livestream tooling, content formats, moderation rules, and commerce integrations—as much as it depends on the technical quality of the digital human itself.
On the right, the diagram shows a dense panel of logos representing downstream stakeholders, mixing institutions, media organizations, internet platforms, consumer brands, and content or commerce ecosystems. The intent is not that all logos play the same role, but that downstream adoption is broad and cross-industry: banks and financial institutions imply service avatars and customer-facing digital staff; newspapers and media outlets imply anchors, hosts, and branded presenters; major platforms and app ecosystems imply distribution, creator tools, and interactive formats; and consumer retail and beauty/food brands imply marketing spokescharacters, virtual brand ambassadors, and commerce-driven live presentations. Some of the most recognizable platform and ecosystem examples in this panel include Tencent, NetEase, Alibaba Group, Xiaomi, iQIYI, Bilibili, Douyin, Kuaishou, Huya, and L'Oréal. The diagram’s core message is that downstream is already populated by entities with existing traffic, brands, and operational capacity, which makes it the “pull” that shapes what midstream providers build and what upstream technologies prioritize.
Along the bottom, the “capital side” (资本方) strip lists investors and strategic backers, implying that downstream growth is partially driven by funding, acquisitions, and ecosystem investment by large platforms and venture firms. By placing capital as a distinct downstream element rather than an external factor, the diagram suggests that financing is not merely supportive but structurally embedded in how digital human applications scale—funding production pipelines, subsidizing platform experimentation, accelerating go-to-market for service providers, and shaping consolidation around dominant platforms. The inclusion of both major platform-affiliated capital and venture capital signals a mixed funding model: strategic investment to reinforce platform ecosystems alongside financial investment seeking returns from high-growth application categories like livestream commerce, branded content, and enterprise service automation.
Finally, the lowest band labeled “Industrial Security and Other Services” (产业安全及其他服务) functions as a cross-cutting foundation for downstream deployment. In context, this implies that once digital humans are applied in finance, healthcare, media, and commerce, issues such as identity management, content and brand risk, privacy protection, data governance, model safety, deepfake/misuse prevention, compliance, and operational reliability become integral services. The diagram’s placement of this band at the base signals that downstream adoption at scale is conditioned not only on creative or technical capability, but also on the existence of assurance mechanisms and supporting services that reduce institutional risk and enable sustained, regulated operation across sensitive sectors.