Supporters of Marcus Endicott’s Patreon can access weekly or monthly video consultations on this topic.
Chen, J., Ma, X., Wang, L., & Cheng, J. (2023). Blendshape-based migratable speech-driven 3D facial animation with overlapping chunking-transformer. In L. Wang, J. Zhou, M. Yang, & Q. Liao (Eds.), Pattern recognition and computer vision: PRCV 2023 (pp. 41–53). Springer.
This study addresses two constraints of existing Transformer-based speech-driven 3D facial animation systems, namely mesh specificity and inability to process long audio inputs, by predicting blendshape coefficients rather than raw vertices and introducing an overlapping chunking strategy that segments lengthy audio while preserving temporal continuity across chunk boundaries. The approach, supplemented by a data calibration step that refines blendshape training data for more natural lip movement, outperformed vertex-prediction baselines and demonstrated successful migration of generated animations across diverse facial meshes. The evaluation remains limited by its reliance on a single dataset and a narrow set of quantitative metrics, leaving open the question of whether the quality gains and mesh-transfer capability generalize to more varied speech conditions, languages, or substantially different facial topologies.
Chen, S., Zhang, D., Shi, W., Ding, X., & Chang, L. (2024). Exploring the efficacy of interactive digital humans in cultural communication. In G. Zhai, J. Zhou, H. Yang, L. Ye, & P. An (Eds.), Digital multimedia communications: 20th International Forum on Digital TV and Wireless Multimedia Communications (IFTC 2023), revised selected papers (pp. 220–239). Springer.
This experiment tested whether interactive digital humans improve the effectiveness of historical and cultural communication compared to traditional single-channel dissemination methods, finding that digital humans with enhanced interactivity increased audience absorption of historical and cultural content and fostered greater engagement. The study's principal limitation is that the abstract frames its conclusions in broad, generalized terms about digital humans playing "a positive role" without specifying the scale of the measured effect, the size or composition of the participant sample, or the particular historical content tested, making it difficult to assess how robust or generalizable the reported gains in audience engagement actually are.
Cui, L., & Liu, J. (2023). Virtual human: A comprehensive survey on academic and applications. IEEE Access, 11, 123830–123845.
This survey traces virtual human technology from its origins in film and television through its expansion into media, gaming, finance, and other sectors, classifying virtual humans into three types (identity-based, service-based, and virtual idols) and mapping academic publication trends via Web of Science data, concluding that the field has matured rapidly but remains constrained by unresolved challenges in emotion recognition, privacy and security, and the uncanny valley effect. The survey's principal limitation is its breadth-over-depth approach: by spanning development history, technical architecture, classification, and application scenarios within a single paper, it necessarily treats each area at a general level and does not systematically evaluate or benchmark the specific technologies it catalogs.
Deng, F., & Jiang, X. (2023). Effects of human versus virtual human influencers on the appearance anxiety of social media users. Journal of Retailing and Consumer Services, 71, Article 103233.
This study compared the effects of human influencers and virtual human influencers on social media users' appearance anxiety, finding that idealized virtual influencer portrayals do trigger appearance anxiety through upward social comparison, but to a lesser degree than their human counterparts, with photorealistic virtual influencers producing weaker appearance comparisons than real influencers. The reported effect size was small to medium, and the study's reliance on a single experimental exposure cannot account for the cumulative effects of sustained, repeated exposure to idealized virtual influencer imagery that characterizes actual social media use.
Du, Z. (2023). 技术身体再造视角下虚拟数字人的正面效应与风险研究 [Research on the positive effects and risks of virtual digital humans from the perspective of technological body re-creation]. People’s Tribune, 2023(23), 44–47.
Framing virtual digital humans as technologically reconstructed bodies within a posthuman theoretical context, the study concludes that these entities extend human perception, identity, and even the conceptual boundaries of life by enabling consciousness to persist beyond the biological body, while simultaneously generating serious risks including technological dependence that erodes individual autonomy, emotional and decisional manipulation by AI-driven avatars, and deepening identity fragmentation as users invest subjectivity in virtual selves. The analysis remains entirely theoretical, drawing on posthuman philosophy and Hegelian master-slave dialectics without empirical data, user studies, or case evidence to substantiate either the claimed positive effects or the severity of the risks it identifies.
Gao, W., Jiang, N., & Guo, Q. (2023). How do virtual streamers affect purchase intention in the live streaming context? A presence perspective. Journal of Retailing and Consumer Services, 73, Article 103356.
This study examined how virtual streamers' visual realism affects consumers' purchase intention in live streaming commerce, grounded in presence theory. The study found that virtual streamers exhibiting high formal realism strengthen purchase intention by enhancing viewers' sense of social presence, but this effect holds only when behavioral realism is also high, indicating that appearance alone is insufficient without convincingly human-like movement and interaction. The reliance on PLS-SEM with self-reported survey data limits the study's ability to establish causal relationships or capture actual purchasing behavior as distinct from stated intention.
He, Z. (2023). A brief analysis of the new pattern of China's virtual anchor enabling webcast: Taking Jiaran as an example. Frontiers in Art Research, 5(1), 21–25.
Using the A-Soul member Jiaran as a case study, the analysis attributes the rapid rise of entertainment virtual anchors in China's live-streaming sector to four converging factors: carefully constructed character image design, real-time motion-capture and rendering technology, fan communities rooted in participatory "quadratic element" culture, and crossover collaborations that extend virtual anchors beyond niche audiences into mainstream media. The study's principal limitation is its reliance on a single exceptionally successful case, Jiaran having set a global virtual anchor live-stream revenue record with a birthday broadcast, which makes the identified success factors difficult to generalize to less prominent virtual anchors operating without comparable corporate backing or established fandom infrastructure.
Hong, J., & Ding, Z. (2023). 从离身到具身:虚拟新闻主播应用策略研究——以上海广播电视台“申䒕雅”为例 [From disembodiment to embodiment: Application strategies of virtual news anchors—A case study of Shanghai Media Group’s “Shen Xiaoya”]. China Media Science and Technology, (4).
This case study uses the theoretical framework of disembodied and embodied cognition from cognitive science and draws on in-depth interviews with the Shen Xiaoya production team at Shanghai Media Group to trace how virtual news anchors originate from disembodied cognitive principles — computational, algorithm-driven presentation — and progressively evolve toward embodied cognition through affective engagement, personality construction, and real-time interactive presence, concluding that mainstream media should leverage computational efficiency while strengthening emotional connection, though the study's reliance on a single station's virtual anchor and its own operational staff as the sole interview source limits the generalizability of its claims about the trajectory of virtual news anchors as a category.
Hu, H.-H., & Ma, F. (2023). Human-like bots are not humans: The weakness of sensory language for virtual streamers in livestream commerce. Journal of Retailing and Consumer Services, 75, Article 103541.
The study found that when virtual streamers in livestream commerce use sensory language describing product experiences, viewers' purchase intention decreases rather than increases, because consumers perceive such language as violating expectations about what a non-human entity can credibly report having experienced. This negative effect reverses when viewers learn the virtual streamer is operated by a human rather than by AI, restoring the perceived legitimacy of sensory claims. The evidence rests entirely on online scenario-based experiments and a focus group rather than on observation of actual purchasing behavior in live commercial broadcasts, leaving open whether the reported effects hold under the attentional and social pressures of real-time viewing.
Huang, T., Li, S. Z., Yuan, X., & Cheng, S. (2023). Roadmap towards meta-being. arXiv.
This preprint proposes a staged technical roadmap for constructing AI-driven digital humans within the metaverse, concluding that a viable pipeline must proceed from 3D modeling and rendering for immersive display, through multimodal dialogue systems integrating audio, facial expressions, and body movements, to deployment in metaverse economic applications with security safeguards. The paper is a high-level architectural overview rather than a report of implemented systems or experimental results, which means none of the proposed pipeline stages are validated against performance benchmarks, user studies, or real-world deployment data, leaving the feasibility of the integrated end-to-end roadmap undemonstrated.
Huang, W., Xia, C., & Tie, Z. (2023). Digital memory and metaverse: The value of digital memory for virtual digital human (数字记忆与元宇宙:数字记忆对虚拟数字人的价值). Library Tribune (图书馆论坛), 43(7), 154–161.
This article, part of a three-part series from the Shanghai Library's digital humanities team, argues that digital memory resources held by libraries and cultural heritage institutions can provide virtual digital humans in the metaverse with historical depth, cultural grounding, and knowledge-based identity that current commercial virtual digital humans lack, concluding that integrating digital memory into virtual digital human design would transform them from superficial visual entities into culturally meaningful agents capable of preserving and transmitting collective memory. The argument is entirely conceptual, proposing value propositions and application scenarios without implementing or testing any digital memory integration with an actual virtual digital human system, which leaves open whether the envisioned enrichment would meaningfully alter user experience or cultural transmission outcomes in practice.
Huang, Y., & Yu, Z. (2023). Understanding the continuance intention for artificial intelligence news anchor: Based on the expectation confirmation theory. Systems, 11(9), 438.
This survey of 598 Chinese online users examined whether viewers intend to continue watching AI news anchors, finding that overall continuance intention is positive but not robust, with satisfaction serving as the strongest direct predictor and the most significant mediator between perceived anthropomorphism, attractiveness, and expectation confirmation on the one hand and continuance intention on the other. The study's stimulus materials were limited to two female AI anchors with relatively low levels of anthropomorphism, deliberately selected to avoid the uncanny valley effect, which means the findings cannot account for how viewers respond to male AI anchors, highly realistic designs, or the full spectrum of anthropomorphism where discomfort effects are most likely to suppress continuance intention.
Jian, S. (2023). 虚拟数字人概念:内涵、前景及技术瓶颈 [The concept of “virtual digital human”: Connotation, prospects, and technical bottlenecks]. Journal of Shanghai Normal University (Philosophy and Social Sciences Edition), 52(4), 45–57.
This conceptual analysis disambiguates the term "virtual digital human" (虚拟数字人) from adjacent concepts such as "virtual human" and "digital human," defines the category as requiring human appearance, expressive behavioral capacity, and interactive intelligence, and concludes that while application prospects span media, commerce, and public services, persistent technical bottlenecks in real-time interaction naturalness, high-fidelity rendering at scale, and intelligent autonomy remain the primary barriers to widespread deployment. The analysis is a theoretical and definitional exercise that draws on industry white papers and existing literature rather than original empirical data, which means its claims about technical bottlenecks reflect reported consensus rather than independent benchmarking or testing of specific systems.
Liberati, N., & Chen, J. J. (2023). Augmented Galatea for physical Pygmalion: A phenomenological approach to intimacy in VTubers in the East Asia region. In V. Geroimenko (Ed.), Augmented reality and artificial intelligence: The fusion of advanced technologies (pp. 61–71). Springer.
This chapter uses phenomenology and postphenomenology to analyze how VTubers in China generate intimate connections with users, concluding that virtual characters, despite not being human, actively reshape viewers' subjectivities and their experience of intimacy by functioning as what the Pygmalion-Galatea metaphor frames as technologically animated objects that users invest with relational significance. The analysis is entirely theoretical, drawing on no empirical data such as interviews, surveys, or observation of actual VTuber-viewer interactions, which means its claims about how Chinese users experience intimacy with VTubers rest on philosophical inference rather than documented audience behavior.
Liu, J. (2023). Virtual presence, real connections: Exploring the role of parasocial relationships in virtual idol fan community participation. Global Media and China. Advance online publication.
This survey study examines why fans of virtual idols in China participate in fan communities, finding that both the perceived interpersonal attractiveness of virtual idols and fans' own loneliness significantly drive fan community participation, with parasocial relationships serving as a mediating mechanism between those predictors and participatory behavior. The study treats virtual idols as a single undifferentiated category, grouping anime-style characters like Luo Tianyi with hyper-realistic figures like Ling despite evidence elsewhere that audiences respond to these visual styles in fundamentally different ways, which means the reported effects may obscure substantial variation across virtual idol subtypes.
Liu, R. (2023). What affect VTuber audience behavior?: Empirical evidence from China (Master’s thesis, Waseda University).
This survey-based study examines the determinants of VTuber audience behavior among Chinese viewers, finding that factors such as perceived attractiveness and interactive capacity of VTubers significantly influence audience attitudes and engagement. The study contributes the concept of "virtual idol streamers" as a hybrid category bridging virtual idols and virtual streamers, though the ICEB 2024 literature has since noted this framing remains conceptually underdeveloped. As a single-country MBA thesis relying on self-reported survey data, the study's sample is likely skewed toward dedicated VTuber viewers rather than casual or lapsed audiences, limiting how far the findings can generalize to broader Chinese media consumption patterns or to VTuber ecosystems outside China.
Luo, L., & Kim, W. (2023). How virtual influencers' identities are shaped on Chinese social media: A case study of Ling. Global Media and China. Advance online publication.
This case study examines how Weibo audiences perceive China's first computer-generated virtual influencer, Ling, finding that Ling's persona is deliberately constructed around traditional Chinese cultural identity through Peking Opera references, guofeng aesthetics, and brand partnerships, while audience responses reveal tension between admiration for the cultural positioning and uncanny valley discomfort triggered by Ling's hyper-realistic human-like appearance. The analysis draws on a single virtual influencer active on one platform, making it impossible to determine whether the cultural-identity construction strategy and the audience ambivalence observed are specific to Ling or generalizable to other Chinese virtual influencers operating across different platforms and visual styles.
Qi, Y., & Sun, Y. (2023). Visualization and bibliometric analysis of research evolution on digital human. In Proceedings of the 2023 6th International Conference on Big Data Technologies (ICBDT '23). ACM.
This bibliometric analysis maps the evolution of digital human research using Web of Science literature published before June 2023, finding that publication volume grew substantially over 27 years, with biomedicine and entertainment media serving as the primary catalysts for the field's development. The analysis concludes that the current prominence of digital humans reflects accumulated cross-disciplinary effort and incremental gains in digital productivity rather than a sudden technological breakthrough. The study's reliance on a single database and a cutoff just before the generative-AI surge of late 2023 means it captures none of the large-language-model-driven developments that rapidly reshaped both the technical landscape and the research agenda for digital humans.
Regis, R., Ferreira, J. C., & Tavares, V. P. (2023). VTubers and pandemic in China: A new dimension of technological cultural production. Revista Memorare, 10(2), 3–28.
This narrative review and case study analysis examines the rapid growth of VTubers in China during the COVID-19 pandemic, framing the phenomenon within China's longer history of state-shaped cultural industry policy and the particular dynamics of domestic fan participation networks. The study's conclusion is that Chinese VTubers' pandemic-era expansion was not driven by the pandemic alone but by a convergence of factors specific to China — including the scale and intensity of domestic fan engagement and the indirect involvement of both public and private entities in structuring that participation — and that despite a domestic market of over 300 million fans, Chinese VTubers have not achieved export parity with Japanese or North American counterparts. The principal limitation is that the analysis relies on a narrative literature review and case studies of prominent Chinese VTubers without systematic data collection or defined selection criteria, which means the conclusions about the distinctiveness of China's VTuber ecosystem rest on illustrative examples rather than verifiable evidence.
Shen, S. Y., & Zhang, W. (2023). A method for synthesizing dynamic image of virtual human. IEEE conference proceedings.
This conference paper proposes a text-driven pipeline for synthesizing audio-video synchronized virtual human images, combining text-to-speech conversion, speech-driven lip-shape generation, and thin plate spline transformation to animate a source portrait. The reported conclusion is that the pipeline effectively resolves two specific failure modes of text-driven avatar generation — lip-shape mismatch and audio-video asynchrony — and produces high-quality, high-fidelity, low-latency output. The principal limitation is that the evaluation rests on the authors' own qualitative characterization of output quality with no reported quantitative benchmarking against competing methods or user-study validation of perceptual fidelity, leaving the performance claims unsupported by independent evidence.
Song, W., He, Q., & Chen, G. (2023). Virtual human talking-head generation. In CACML '23: Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning (pp. 1–5). ACM.
This conference paper surveys the talking-head video generation component of virtual human systems, classifying generation models by their depth of approach and reviewing five years of technical development. Its conclusion is that existing deep learning methods remain dependent on labeled data, that standard objective metrics such as PSNR and SSIM poorly capture human perceptual judgment of output quality, and that the field has only begun exploring alternative learning paradigms such as few-shot and knowledge-distillation approaches to address data dependency. The limitation is that the review is a short conference paper of five pages covering a large and fast-moving literature, which constrains the depth of comparative analysis and leaves the classification scheme largely descriptive rather than evaluative.
Song, Y., Zhang, W., Chen, Z., & Jiang, Y. (2023). A survey on talking head generation. Journal of Computer-Aided Design and Computer Graphics, 35(10), 1535–1556.
This review covers talking head generation as a deep learning subfield of virtual digital human research, organizing its account around dataset availability, core generative technologies — visual generation, image recognition, speech recognition, and cross-modal analysis — and evaluation strategies. Its principal conclusion is that data quality and scale, model generalization, and evaluation metrics each remain unresolved pressure points constraining the field's progress, and it frames these as the primary directions requiring future attention. The limitation is that the review synthesizes only 80 references across a technically dense and rapidly moving field, a modest corpus for a survey claiming comprehensive coverage, and the framing of conclusions as a "future outlook" suggests the review's analytical contribution is largely taxonomic rather than critical.
Sun, Y., Sun, Z., Wen, Y.-H., Ye, S., Lv, T., Yu, M., Yi, R., Gao, L., & Liu, Y.-J. (2023). Generation of virtual digital human for customer service industry. Computers and Graphics, 115, 359–370.
This technical paper proposed a two-stage deep learning pipeline for generating 2D virtual digital human videos tailored to customer service scenarios, pairing it with what the authors claim is the first dataset to capture human video with both template service actions and emotional talking faces covering basic service situations. The pipeline first synthesizes transition frames between template gesture actions to produce a coherent action video, then generates an emotional lip-sync facial layer to replace the face region, with the two stages combined to yield a virtual agent exhibiting contextually appropriate gestures and emotionally congruent speech. The principal limitation is that evaluation rested on the pipeline's own dataset, which was constructed by the same research group, meaning the reported performance reflects behavior within a controlled, domain-specific corpus rather than generalization to the messier, more varied speech and gesture conditions that real customer service deployment would involve.
Usami, Y., Kitaoka, K., & Shindo, K. (2023). Integrated artificial intelligence for making digital human. In Proceedings of the 2023 15th International Conference on Machine Learning and Computing (ICMLC '23) (pp. 267–273). ACM.
This systems paper proposed an integrated architecture for constructing a digital human by combining multiple existing AI subsystems — image recognition, audio recognition, natural language processing, and facial expression generation — into a single pipeline, arguing that true artificial intelligence requires the convergence of these research streams rather than their separate development. The reported contribution is architectural: the paper demonstrates that such integration is feasible and produces a working digital human agent capable of perceiving and responding to a human interlocutor. The principal limitation is that the 2023 system's conversational component relied on a question-answering module rather than a generative language model, a constraint the authors themselves identified and addressed in a direct sequel, which means the dialogue capability described here was already acknowledged as insufficient at the time of publication.
Wan, A., & Jiang, M. (2023). Can virtual influencers replace human influencers in live-streaming e-commerce? An exploratory study from practitioners' and consumers' perspectives. Journal of Current Issues and Research in Advertising, 44(3), 332–372.
This mixed-methods study combined expert interviews with industry practitioners and a consumer experiment to ask whether virtual influencers can substitute for human influencers in live-streaming e-commerce. Both data sources converged on a negative answer: virtual influencers generated less positive consumer attitudes and lower perceived warmth, trust, usefulness, and sense of dialogue than their human counterparts, with the consumer experiment confirming what practitioners had already reported from professional experience. The principal limitation is that the experimental stimuli were necessarily constructed rather than drawn from real live-streaming contexts, which means the consumer responses measured reflect reactions to designed representations of virtual influencers rather than behavior observed in the commercially and socially complex environment of actual live-streaming rooms.
Wang, C., Xu, J., & Shang, Q. (2023). The mechanism of virtual anchor interactivity on consumer purchase behavior in e-commerce live streaming (电商直播中虚拟主播互动性对消费者购买行为的影响). Economy and Management (经济与管理), 37(2), 84–92.
This mixed-methods study used a questionnaire survey and eye-tracking experiment to examine how virtual anchor interactivity affects consumer purchase intention in e-commerce live streaming. Social presence fully mediated the interactivity-to-purchase-intention relationship, meaning interactivity produced no direct effect and operated entirely through consumers' felt sense of human connection. The eye-tracking component added that this mechanism holds only for hedonic products, where high interactivity redirected visual attention toward the product and anchor and raised purchase intention; for utilitarian products, interactivity level had no significant effect on attention or purchase intention. The study's core claim is therefore limited to hedonic consumption contexts, leaving virtual anchor behavior in the utilitarian categories that dominate much of Chinese e-commerce live streaming empirically unresolved.
Wang, P., & Lu, Z. (2023). Let’s play together through channels: Understanding the practices and experience of danmaku participation game players in China. Proceedings of the ACM on Human-Computer Interaction, 7(CHI PLAY), 1025–1043.
This qualitative study — combining observation of 50 DPG channels on Bilibili with semi-structured interviews with 13 players — examined danmaku participation games, a format that emerged on Bilibili in 2022 in which each viewer controls an independent character via text comment rather than sharing collective control as in Western-style audience participation games. The study found that this individual-identity structure produces meaningfully higher player engagement and a stronger sense of personal accomplishment than crowd-based formats, because players can interact freely with one another within the channel rather than being subordinated to an aggregated command system; the format also lowers participation barriers for players without high-end devices or sustained availability. Against those gains, the study also found that the same structure forces novice players to accept oversight from skilled ones and to absorb the disruptive effects of other players on their experience, a tension the study describes but does not resolve. The principal limitation is sample scale and origin: thirteen interviewees recruited primarily from the very communities organized around specific DPG channels constitutes a small and positively self-selected pool, making it difficult to assess how representative their reported experiences are of the broader and more casual DPG player base on Bilibili.
Wang, Q., Long, S., Zeng, Y., Tang, L., & Wang, Y. (2023). The creative behavior of virtual idol fans: A psychological perspective based on MOA theory. Frontiers in Psychology, 14, Article 1290790.
Using MOA theory as its framework, this survey-based study (n = 317, recruited via a Chinese online platform) found that interest, achievement, social, and utility motivations each independently and positively predicted fans' creative output toward virtual idols, while perceived cost exerted a significant negative moderating effect on the relationship between three of those motivations — interest, social, and utility — and creative behavior, and a positive community atmosphere produced a substantial positive moderation across the model. The knowledge-and-skills dimension of ability produced no significant moderating effect, a result the survey design cannot adequately explain. The principal limitation is that the sample was drawn entirely from Chinese-platform users through self-selection, so the motivational structure identified — particularly the weight of utility motivation — may reflect the specific commercial and community dynamics of China's virtual idol ecosystem rather than fan creative behavior in other cultural or platform contexts.
Wang, T., Ye, P., Lv, H., Gong, W., Lu, H., & Wang, F.-Y. (2023). Modeling digital personality: A fuzzy-logic-based Myers–Briggs type indicator for fine-grained analytics of digital human. IEEE Transactions on Computational Social Systems, 11(1), 1096–1107.
This paper addresses the limitation of standard MBTI classification, which reduces personality to one of sixteen mutually exclusive types by treating each of the four dichotomous axes as binary, and proposes a fuzzy-logic-based reformulation that assigns continuous membership values along each axis rather than hard categorical assignments. The result is a fine-grained personality representation intended to enable more granular differentiation among digital humans deployed in cyberspace applications such as recommendation systems and intelligent assistants, where the coarseness of sixteen types obscures meaningful interpersonal variation. The study's fundamental limitation is that fuzzy-logic refinement cannot compensate for the well-documented psychometric weaknesses of MBTI itself: the instrument has poor test-retest reliability and its four dichotomous dimensions lack the construct validity of established alternatives such as the Big Five, meaning that extending a contested typological framework with greater computational granularity does not resolve the question of whether those dimensions adequately capture personality in the first place.
Wei, L., Wang, Y., & Li, D. (2023). Research on speech-based digital human driving system based on Unreal Engine. In 2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE.
This short conference paper describes the design and implementation of a speech-driven digital human animation system built within Unreal Engine, in which a blendshape-based micro-expression driving algorithm dynamically modulates facial expressions in synchrony with audio input. The paper's central contribution is the demonstration that blendshape manipulation can sustain expressive, dynamically varying facial output within a real-time rendering environment, producing a functional end-to-end pipeline from speech input to rendered character output. The principal limitation is that the paper, as a brief symposium contribution, presents no formal perceptual evaluation, no benchmarking against comparable speech-driven animation systems, and no quantitative assessment of expression accuracy, lip-sync quality, or latency under varied speech conditions, leaving the claimed effectiveness of the micro-expression algorithm supported by implementation demonstration alone rather than independent validation.
Wu, R., Liu, J., Chen, S., & Tong, X. (2023). The effect of e-commerce virtual live streamer socialness on consumers' experiential value: An empirical study based on Chinese e-commerce live streaming studios. Journal of Research in Interactive Marketing, 17(5), 714–733.
This experimental study examines how the degree of socialness displayed by virtual live streamers — the extent to which a streamer exhibits human-like social cues — shapes consumer experiential value in Chinese e-commerce live streaming contexts. Across four laboratory experiments, the study finds that higher streamer socialness produces significantly greater experiential value, encompassing both utilitarian and hedonic dimensions, with social presence mediating both pathways; communication style and situational context each moderate the socialness-to-experiential-value relationship significantly. The study's principal limitation is the reliance on laboratory conditions across all four experiments: controlled stimuli allow causal inference but substantially reduce ecological validity for the actual e-commerce streaming environment, where product category, audience size, real-time comment volume, and platform affordances all interact with streamer characteristics in ways that laboratory designs cannot replicate, leaving open the extent to which the magnitude of the socialness effect holds under naturalistic viewing conditions.
Wu, Y., Zhang, Y., & Xue, Z. (2023). Effect of virtual influencer entertainment value on fashion brand purchase intention (虚拟形象娱乐价值对时尚品牌购买意愿的影响). Journal of Textile Research (纺织学报), 44(12), 153–161.
This empirical study examines how the entertainment value that virtual digital humans deliver shapes fashion brand purchase intention among Chinese consumers. Working from 282 valid questionnaires and a regression-based mediated moderation model, the study finds that virtual influencer entertainment value has a significant positive effect on purchase intention through two distinct pathways: a direct effect and an indirect effect mediated by consumer immersion. Consumer gender moderates the mediated pathway significantly — the moderating effect is strongest when the consumer is male and the virtual character is an anime type — while virtual character type alone produces no significant moderation of the mediated effect overall. The study's principal limitation is its sample size and composition: 282 responses drawn from a single national context provide a narrow empirical base, and because the character-type variable reduces to a binary anime versus non-anime distinction, the finding of non-significant type moderation may reflect categorical underdifferentiation rather than a genuine null effect across the far broader range of virtual influencer aesthetics deployed in fashion marketing.
Yan, Y., Cheng, Y., Chen, Z., Peng, Y., Wu, S., Zhang, W., Li, J., Li, Y., Gao, J., Zhang, W., Zhai, G., & Yang, X. (2023). A survey on generative 3D digital human research based on neural networks: Representation, rendering, and learning (基于神经网络的生成式三维数字人研究综述:表示、渲染与学习). Scientia Sinica Informationis, 53, 1858–1879.
Produced at Shanghai Jiao Tong University by a team led by Yichao Yan and Xiaokang Yang, this Chinese-language survey in Scientia Sinica Informationis reviews the generation of high-fidelity 3D digital humans through deep learning, organising the field around three interdependent pipeline components: model representation, neural rendering, and generative learning. The survey's central finding is that the field has bifurcated along representation type — explicit representations such as polygon meshes remain dominant in industrial applications due to rendering pipeline maturity, while implicit representations such as neural radiance fields offer superior generative flexibility but at greater computational cost — and that GAN-based generative models and controllable deformation field approaches currently constitute the two main learning paradigms for full-body and head-centric generation. The paper identifies high-fidelity controllable generation, efficient real-time rendering, and cross-modal animation as the field's principal open problems. Its primary limitation as a reference is temporal rather than methodological: published in early 2023 and covering literature through 2022, the survey predates the rapid displacement of GAN-based approaches by diffusion-model-based generation pipelines, which significantly affects the relevance of its comparative assessments.
Yang, D., Sun, M., Zhou, J., Lu, Y., Song, Z., Chen, Z., Yang, D., Wu, X., Ge, H., Zhang, Y., Gao, C., Xuan, J., Li, X., Yin, J., Zhu, X., Liu, J., Xin, H., Jiang, W., Wang, N., ... Bai, C. (2023). Expert consensus on the "digital human" of metaverse in medicine. Clinical eHealth, 6, 159–163.
This five-page expert consensus statement, produced by a large multidisciplinary author group that includes the managing editor and editors-in-chief of the publishing journal Clinical eHealth, extends the same group's 2022 metaverse-in-medicine consensus to focus specifically on digital human technology as the embodied interface layer of medical metaverse systems. Framed against China's Healthy China 2030 policy and the associated shift from batch-style to personalised healthcare, the paper defines medical digital humans, maps their anticipated functions across disease prevention, remote consultation, health education, chronic disease management, and rehabilitation, and concludes that digital human deployment in clinical settings is both technically feasible and strategically aligned with national healthcare reform objectives. Its evidential value is as a positional and definitional document from a clinically credentialled author group that helped establish the foundational vocabulary for this application domain in Chinese medical literature; its principal limitation is that the consensus is not grounded in clinical trial data or systematic review, and the editorial relationship between the authorship and the publishing journal — explicitly disclosed but structurally significant — weakens the independence of peer review for a paper making normative claims about an emerging clinical technology.
Yang, R., Li, L., Gan, W., Chen, Z., & Qi, Z. (2023). The human-centric metaverse: A survey. In Companion Proceedings of the ACM Web Conference 2023 (WWW '23 Companion). ACM.
Produced primarily at Jinan University and presented at the WWW '23 Companion track — a short-paper venue running alongside the main ACM Web Conference — this eleven-page survey argues that human experience, rather than technical infrastructure, should be the organising principle of metaverse design, and surveys enabling technologies including extended reality, avatars, digital humans, blockchain, and the Web of Things through that lens. The paper's concrete conclusions are that the metaverse is currently immature and technically fragmented, that a genuinely human-centric realisation requires that diversity and inclusion be built into foundational design rather than retrofitted, and that avatar and digital human representation constitute the primary interface through which users inhabit and relate to metaverse environments. Its evidential value lies in consolidating a cross-disciplinary technology landscape under a coherent normative framing that foregrounds user welfare, though its principal limitation is that the survey appears in a companion proceedings track subject to lighter peer review than the main conference, and its arguments for human-centricity remain largely prescriptive rather than grounded in user research or empirical evaluation of existing metaverse deployments.
Yu, Y., Kwong, S. C., & Bannasilp, A. (2023). Virtual idol marketing: Benefits, risks, and an integrated framework of the emerging marketing field. Heliyon, 9(11), Article e22164.
This is a conceptual rather than empirical paper, produced primarily at Jiaying University in Guangdong and funded by the Guangdong Planning Office of Philosophy and Social Science. It establishes a working definition of virtual idols distinct from adjacent categories such as virtual influencers and avatars, catalogues the marketing benefits — including scandal immunity, full brand controllability, persistent availability, and strong resonance with Generation Z consumers — alongside the risks, which centre on audience scepticism, high production costs, and the reputational exposure created by human voice actors and operators behind the virtual persona. The paper's central output is a theoretically synthesised framework, drawing on parasocial interaction theory and emotional attachment theory, that maps the mechanism by which virtual idol characteristics generate consumer responses leading to brand attitude and purchase intention. Its principal limitation is the complete absence of primary empirical validation: the framework is assembled from existing literature rather than tested against new data, leaving its proposed causal pathways as hypotheses awaiting confirmation rather than established findings.
Zhang, X. (2023). The research on the characteristics of Chinese game virtual streamers. BCP Education & Psychology, 9, 52–60.
Examining Chinese game virtual streamers — known domestically as VUPs — through qualitative observation and case analysis of active streamers on Chinese platforms, the paper identifies five defining characteristics: strong role-playing capacity, content variety across game genres, humor, versatility in ancillary skills such as music and dance, and responsiveness to trending internet culture. The central conclusion is that these five attributes function interdependently to sustain audience attention and deepen viewer interaction, and that competitive pressure within China's dense virtual streamer market has elevated versatility from a differentiating asset to an entry-level expectation. The paper's value is primarily descriptive and taxonomic, offering a practitioner-oriented framework grounded in specific streamer examples; its principal limitation is that the characteristic taxonomy derives from selective case observation rather than systematic sampling or any quantitative validation, and BCP Education & Psychology functions as a conference proceedings venue rather than a peer-reviewed journal, which limits the work's methodological accountability and the generalisability of its conclusions.
Zhang, X., Shi, Y., Li, T., Guan, Y., & Cui, X. (2023). How do virtual AI streamers influence viewers' livestream shopping behavior? The effects of persuasive factors and the mediating role of arousal. Information Systems Frontiers, 26(5), 1803–1834.
Drawing on 559 livestream viewers surveyed through a scenario-based instrument and analysed via maximum likelihood structural equation modelling cross-validated with Bayesian SEM, this study examines which characteristics of AI-powered virtual streamers drive parasocial interaction intention and impulse buying intention, with arousal as a mediating mechanism. The principal findings are that coolness is the dominant antecedent of parasocial interaction intention, while congruence and mind perception are the stronger predictors of impulsive urge to buy; arousal mediates the path from persuasive streamer attributes to both outcome variables, and growth versus fixed mindset moderates the arousal-to-intention relationship. The study's most consequential limitation is its scenario-based design: participants responded to constructed stimuli rather than live streaming situations they self-selected, which inflates internal validity at the cost of ecological validity and makes it difficult to confirm that reported impulse buying intention translates to actual purchasing behaviour in real commercial contexts.
Zhen, R., Song, W., He, Q., Cao, J., Shi, L., & Luo, J. (2023). Human-computer interaction system: A survey of talking-head generation. Electronics, 12(1), Article 218.
Produced at the State Key Laboratory of Media Convergence and Communication at Communication University of China, this review proposes an end-to-end human-computer interaction framework integrating speech recognition, text-to-speech synthesis, dialogue management, and virtual human generation, then uses that system architecture as an organising structure for a taxonomy of deep learning-based talking-head video generation models developed over the preceding five years. The central taxonomic conclusion is that generation approaches divide cleanly into 2D-based and 3D-based methods, each with distinct trade-offs in realism, controllability, and computational cost, and that audio-driven lip synchronisation and identity-preserving head motion remain the field's core technical challenges. The review's principal limitation is temporal: covering the period up to late 2022, it predates the emergence of NeRF-based and diffusion-based generation paradigms that have since substantially restructured the field's methodological landscape, which constrains the taxonomy's current applicability.
Zheng, F. (2023). Analysis and thinking on the key technology and standardization of virtual digital human in China (对我国虚拟数字人关键技术、标准化情况的分析与思考). New Generation of Information Technology (新一代信息技术), 6(21), 19–27.
The paper surveys China's virtual digital human technology stack — covering rendering, motion capture, voice synthesis, and natural language interaction — alongside the domestic and international standardisation landscape, then assesses these against prevailing policy conditions and deployment patterns. Its central conclusion is that existing policy frameworks and technical standards in China remain inadequate relative to the pace of development, and it calls specifically for building out a more coherent standards architecture and strengthening the country's technological innovation capacity in this domain. Notably, the paper documents Chinese institutional contributions to ITU-T standards, including CAICT's work on frameworks and evaluation methods for digital human application systems, which gives it concrete value as a record of China's standardisation activity at a specific moment. The principal limitation is that the analysis is non-empirical and text-driven, synthesising policy documents and standards filings without independent verification or comparative benchmarking against other jurisdictions, which constrains the rigour of its prescriptive recommendations.
Zhou, S., & Wang, Y. (2023). Application of digital humans in the e-commerce industry. The Frontiers of Society, Science and Technology, 5(12), 83–87.
Produced at a Chinese vocational institution and running to five pages, this paper maps digital human deployment across e-commerce functions — including virtual live-streaming sales, intelligent customer service, and personalised product recommendation — and concludes that the technology offers measurable advantages in availability, cost reduction, and consumer engagement while facing substantive obstacles in technical maturity, data security, and regulatory ambiguity. The paper's most concrete claim is that data security risks and the absence of unified governance frameworks represent the principal constraints on broader commercial adoption in China. Its evidential value is limited by its non-empirical character: the analysis is descriptive rather than experimental or survey-based, drawing on secondary literature without original data collection, which restricts the generalisability of its conclusions.