A second edition of the 2022 “China Virtual Digital Human Influence Index Report” was released on February 21 by the Media Big Data Research Center of the State Key Laboratory of Media Convergence and Communication at the Communication University of China, together with the Youge Metaverse Lab and the Communication University of China Digital Human Research Institute. The report analyzes the status and challenges of China’s virtual digital human industry in 2022 and presents survey data and insights drawn from 151 leading virtual digital human examples.
It introduces the concept of “digital memes,” extending the definition of virtual digital humans by treating an individual’s digital “body,” voice, and behavioral traits, along with the personal data, habits, preferences, and other characteristics left in the digital world, as a set of “digital memes,” described as having informational, common-origin, stable, unique, and variable properties; this is positioned as a theoretical foundation for deeper research and operational practice.
The influence index is revised so that “innovation” is upgraded to “product power,” and overall influence is defined through three core dimensions: product power, communication power, and social power. Within product power, the report breaks down two key drivers—technology and art/design—and proposes an evaluation model and grading approach for each, aiming to standardize product assessment in the field.
On the industry side, the report describes rapid growth in 2022: data supplied by Tianyancha is cited as indicating roughly 587,000 virtual digital human–related companies in China, with more than 278,000 founded in 2022 (a 41.4% increase over 2021); nearly 80% have registered capital under RMB 5 million, more than 80% are under three years old, and 44.9% were founded within one year.
The report attributes growth to multi-level government policy support, increased participation by companies and investors despite broader headwinds, fast expansion of “virtual digital human +” application scenarios, and new momentum from AIGC and related technologies, while identifying challenges including homogeneous competition, difficulty sustaining long-term operations, and insufficient standardization and general-purpose capability.
Focusing on three widely used categories—virtual idols, virtual streamers, and digital employees—the report argues that major firms entering to operate virtual IPs provided demonstrative benchmark cases and that China now shows a relatively complete industry chain. It also notes emerging operational patterns such as a strong skew toward “daughter”-style female personas, cross-domain and multi-role performers, and Matthew-effect concentration.
In an expert views and outlook section, organizations including China Media Group, Tencent, NetEase, Kuaishou, and iFLYTEK share frontier explorations, while academics, investors, and industry specialists provide perspectives on technology, artistic value, key scenarios, and international comparisons. The report forecasts near-term trends including a shift from “digital humans” toward “intelligent digital humans,” generative AI accelerating evolution and commercialization, “virtual digital human +” as a driver of industrial digitalization, the rise of virtual assets and creator ecosystems, and a need for deeper regulation and governance.
The chief scientific advisor, Professor Shen Hao (沈浩 ), is quoted arguing that generative AI will give virtual digital humans a “most powerful brain,” positioning them as digital identity infrastructure in a media-convergence era and reshaping human–AI interaction in metaverse-like digital worlds. Professor Shen Hao is presented in China’s digital-human discourse as a computational communication and media-big-data scholar whose leadership is tied to Communication University of China (中国传媒大学) and its State Key Laboratory of Media Convergence and Communication (媒体融合与传播国家重点实验室), where his work emphasizes data mining, AI methods, complex-network and social-computation approaches, and measurement frameworks for evaluating emerging media phenomena; in the context of virtual digital humans, he is repeatedly positioned as a chief scientific advisor associated with influence-index style reporting and industry-facing assessment, and his cited view is that generative AI supplies virtual digital humans with a “most powerful brain,” moving them beyond primarily audiovisual presentation into higher-agency interactive entities that can function as digital-identity infrastructure in a media-convergence environment and, in metaverse-like digital spaces, materially reshape the patterns and expectations of human–AI interaction.
This image is a radial “overall index” (总体指数) ranking chart for Chinese virtual digital humans, shown as a circular set of spokes where longer bars indicate higher overall influence and each spoke ends with a small avatar portrait and name; the right half uses darker blue tones and contains the highest-scoring cluster, led by 柳夜熙 at the top with the longest bar, followed by prominent names such as A-SOUL女团, JICHIUAN, AYAYI, 天妤, and 星瞳, while the left half shifts to lighter teal bars that are generally shorter, suggesting a lower overall index for that set; the composition emphasizes a top-heavy distribution, with a few leading figures occupying the longest wedges and a long tail of smaller wedges around the circle, consistent with an “influence index” style summary from an industry report.
This image is a Chinese mind-map style framework titled “Virtual Digital Human Art Evaluation Indicators,” explaining how art-direction quality is assessed across three main dimensions: character design recognizability, model accuracy and skeletal binding, and adaptability across different scenes. Under character design recognizability, it breaks evaluation into appearance, personality, and background, with concrete checks such as overall style, hairstyle, clothing, facial expressions during dialogue, signature lines and signature actions, and small gestures that convey temperament, plus narrative elements like the character’s name and story. Under model accuracy and skeletal binding, it emphasizes material and surface realism and coherence, including hair texture, skin realism, clothing and accessory texture/material feel, and the precision and smoothness of body motion and facial and micro-expressions. Under scene adaptability, it focuses on whether the digital human works convincingly in virtual settings and in mixed settings that interact with real scenes, and whether motion and prop interactions look fluent and natural rather than rigid, with an added emphasis on interactive feedback between the digital human and the surrounding scene and objects.
The image is a mind-map titled “Virtual Digital Human Technical Evaluation Indicators,” describing a grading framework that assesses technical performance across five dimensions: speech synthesis results, content generation efficiency, rendering capability, character generation efficiency, and interaction capability, with the accompanying text noting that it covers both production-side technologies (voice, images, rendering) and operation-side technologies (content creation and interaction), and that indicators are further refined by technical complexity, output quality, and intelligence level. The speech synthesis branch emphasizes clarity, recognizability, and naturalness, focusing on whether phonemes/words are distinct, correctly recognized, and sound fluent and emotionally appropriate. The interaction capability branch splits into human–computer interaction and environmental interaction, covering intent understanding, response speed, content accuracy, coherence between speech and expression/motion, and the richness and realism of spatial and prop interactions across virtual and mixed environments. The character generation efficiency branch highlights image quality and generation efficiency, stressing fine detail, realism, high resolution, stable controllability, fast generation, batch production, and adaptability to multiple scenes and styles. The content generation efficiency branch addresses coverage and production speed (including real-time and batch output) alongside content accuracy and cultural suitability. The rendering capability branch breaks down rendering technology by facial features, skeletal motion, skin and hair, and clothing/materials, focusing on refinement, realism, natural motion, and overall consistency between body dynamics, surface appearance, and the rendered result.