Email: baoyuanw AT yahoo.com
Interest: Digital Twin, AI Agent, Personal Assistant
Address: Remote, US
Biography:
I recently joined Zoom after nearly 3.5 years at Xiaobing.ai, where I worked closely with Dr. Harry Shum. At Zoom, my team is to build the core technology stacks for AI companion systems. If you're passionate about these domains and interested in joining our dynamic team, I'd love to connect with you.
I began my journey with Xiaobing.ai (spun off from and invested by Microsoft) in early 2021, transitioning from a fulfilling 12-year tenure at Microsoft. At Xiaobing.ai, I led the AI R&D team, focusing on pioneering multimodal representation learning for conversational AI, advancing visual content generation, and exploring novel interaction technologies for avatar agents.
Prior to this, I served as Senior Principal Researcher and Manager at Microsoft HoloLens and the AI Platform team within Microsoft's AI and Cloud division. My research pursuits included computer vision, learning-based computational photography, and AI-driven content generation. Throughout my career, I've been driven by the ambition to tackle industry-scale challenges, with a keen focus on practical machine learning applications. My collaborations with various product teams, such as Bing Maps, Xbox/Kinect, Microsoft Pix Camera, SwiftKey, Windows, and Cognitive Services, have led to the integration of key technologies into these products. Earlier, I was a lead researcher at Microsoft Research Asia from 2012 to 2015.
I earned my Ph.D. in Computer Science from Zhejiang University in 2012 under the guidance of Professor Yizhou Yu, following my B.S. in Software Engineering from the same institution in 2007. My research journey includes a stint as a research intern at the Internet Graphics Group at Microsoft Research Asia from May 2009 to June 2012, collaborating with Ying-Qing Xu, Xin Tong, and Zhuowen Tu. I also had the opportunity to visit Microsoft Research in San Francisco for three months in 2011, working under the mentorship of Li-Yi Wei and Jaron Lanier. Additionally, I gained early professional experience as a developer intern at Infosys Limited in Bangalore, India, from September 2006 to April 2007.
For a detailed overview of my professional path and contributions, feel free to peruse my latest CV
Work Experiences
2024.6 - 2024.10, Head of AI Companion Core, Zoom Video Communications, USA
2021.2-2024.6 Cofounder and VP of Engineering @xiaobing.ai. Redmond, USA
2015.6-2021.2 Sr.Principal Researcher and Manager at MSR, HoLoLens, Azure AI, Redmond, USA
2012.7-2015.6 Lead Researcher at MSR Asia, Beijing, China
2009.4-2012.6 Research Intern at MSR Asia, MSR San Francisco
2006.9-2007.4 Dev Intern at Infosys, Mysore, India.
Education
2007.9-2012.6 Zhejiang University, Ph.D. in Computer Science.
2003.9-2007.7 Zhejiang University, B.S. in Software Engineering
If you are interested in knowing more about ZJU, please check out https://www.usnews.com/education/best-global-universities/zhejiang-university-504773
Major Consumer Products & Business Solutions
AI Companion: https://www.zoom.com/en/blog/zoom-ai-companion/ Meeting Summarization, next step predictions, multi-turn QAs for meetings/docs, AI model customizations, virtual agents, etc.
AI Employee: https://business.xiaoice.com/ My team ships the advanced closed domain Question and Answering, and open domain Persona Chat solutions for AI being digital brain system, using in-house LLM and tailored tech stacks.
Digital Twin in X Eva (China App stores in both Android and iOS now, international version is coming soon):https://island.xiaoice.com/, technology includes: agent, conversations, video chat, face reenactment, other AIGC features including image generations, TTS, etc
Virtual IPs in Douyin (China TikTok): https://www.douyin.com/user/MS4wLjABAAAA_FX11UDBw7gopcoMWiGn1b8DgdPv5z4Lh_fN5V-WsuQ technology includes: 3D face synthesis, face swap, TTS, singing, etc.
Xiaoice Island:https://island.xiaoice.com/, technology includes: conversations, behavior planning, TTS, Singing, etc.
Xbox/Kinect: I shipped early event prediction, and human gesture recognition system to Xbox One. Check out this video:https://www.youtube.com/watch?v=UP9atMP0aNU
Hololens: I was the tech lead in the human understanding team of HoloLens, my team worked on face 3D reconstruction and tracking, face detection, recognition, and alignments.
Microsoft Pix: https://www.microsoft.com/en-us/microsoftpix?SilentAuth=1&wa=wsignin1.0, I shipped the best burst photo selection, exposure control scene classifier, etc AI models for iPhone devices. Microsoft Pix was named one of the 50 Best Apps of the Year 2016 by the New York Times
SwiftKey: One of the widely used keyboards in the Android platform, I shipped the 3D Animoji system (through a 3D face tracking algorithm using RGB camera only), check out the report: https://ukstories.microsoft.com/features/panda-cat-dog-owl-or-dinosaur-swiftkey-can-turn-you-into-a-cute-animal-when-you-message-friends/
Microsoft Azure Cognitive Service. My team and I shipped face recognition, detection, and alignment models to Azure cognitive services. https://azure.microsoft.com/en-us/pricing/details/cognitive-services/face-api/
Preprints on Vison/Graphics/NLP/Agent
Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung. Sub-object Level Image Tokenization. https://arxiv.org/abs/2402.14327
Nuo Chen, Hongguang Li, Juhua Huang, Baoyuan Wang, Jia Li. Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations. https://arxiv.org/pdf/2402.11975.pdf
Zixiang Zhou, Yu Wan, Baoyuan Wang. A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis. https://zixiangzhou916.github.io/UDE-2/
Zixiang Zhou, Baoyuan Wang. MDSC: Towards Evaluating the Style Consistency Between Music and Dance. https://arxiv.org/abs/2309.01340
Haiming Zhang, Zhihao Yuan, Chaoda Zheng, Xu Yan, Baoyuan Wang, Guanbin Li, Song Wu, Shuguang Cui, Zhen Li. GSmoothFace: Generalized Smooth Talking Face Generation via Fine-Grained 3D Face Guidance. https://arxiv.org/pdf/2312.07385.pdf
Publications(2020- now):
Conversational AI/NLP
Duomin Wang, Bin Dai, Yu Deng, Baoyuan Wang. AgentAvatar: Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents. https://dorniwang.github.io/AgentAvatar_project/ 2024 European Conference on Computer Vision, Workshop on EEC, ECCV' 2024
Nuo Chen, Hongguang Li, Baoyuan Wang, Jia Li. From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf Prompting. https://browse.arxiv.org/abs/2401.05384 to appear in NLRSE 2024.
Chenxu Wang, Bin Dai, Huaiping Liu, Baoyuan Wang. Towards Objectively Benchmarking Social Intelligence for Language Agents at the Action Level. https://arxiv.org/abs/2404.05337 to appear in ACL 2024 (finding)
Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang. Visual Instruction Tuning with Polite Flamingo. Demo: http://clever_flamingo.xiaoice.com/ Paper: https://arxiv.org/abs/2307.01003 Github: https://github.com/ChenDelong1999/polite_flamingo. Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024), BC, Oral presentation, Canada
Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang. DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models. Accepted as a long paper in the main conference of EMNLP 2023, Oral. https://arxiv.org/pdf/2310.05074.pdf
Nuo Chen, Hongguang Li, Yinan Bao, Baoyuan Wang, Jia Li. Natural Response Generation for Chinese Reading Comprehension. Accepted as a finding paper of EMNLP 2023.
Nuo Chen, Hongguang Li, Yinan Bao, Junqing He, Xinshi Lin, Qi Yang, Jianfeng Liu, Ruyi Gan, Jiaxing Zhang, Baoyuan Wang, Jia Li. Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension. Accepted as a finding paper of EMNLP 2023.
Ke Ji, Yixin Lian, Jingsheng Gao, Baoyuan Wang. Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification. The 60th Conference of the Association for Computational Linguistics (ACL 2023 long paper). https://aclanthology.org/2023.acl-long.164.pdf
Dongming Li, Jianfeng Liu and Baoyuan Wang. Triplet-Free Knowledge-Guided Response Generation. The 60th Conference of the Association for Computational Linguistics (ACL 2023 (findings)) paper:https://aclanthology.org/2023.findings-acl.815.pdf code/data/model: https://github.com/dongmingli-Ben/triplet-free
Jingsheng Gao, Yixin Lian, Ziyi Zhou, YuZhuo Fu, Baoyuan Wang, LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming. The 60th Conference of the Association for Computational Linguistics (ACL 2023 long paper) paper: https://aclanthology.org/2023.acl-long.858.pdf
Vision/Graphics
Yu Deng, Duoming Wang, Baoyuan Wang.Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer. https://yudeng.github.io/Portrait4D-v2/ to appear in IEEE ECCV'2024.
Yu Deng, Duoming Wang, Xiaohang Ren, Xinyu Chen, Baoyuan Wang. Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. https://yudeng.github.io/Portrait4D/ to appear in IEEE CVPR 2024.
Shuliang Ning, Duoming Wang, Yipeng Qin, Zirong Jin, Baoyuan Wang, Xiaoguang Han. PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns https://ningshuliang.github.io/2023/Arxiv/index.html to appear in IEEE CVPR 2024
Xihe Yang, Xingyu Chen, Shaohui Wang, Daiheng Gao, Xiaoguang Han, Baoyuan Wang. HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images. https://seanchenxy.github.io/HaveFunWeb/ to appear in IEEE CVPR 2024
Zixiang Zhou, Yu Wan, Baoyuan Wang. AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond https://zixiangzhou916.github.io/AvatarGPT/ to appear in IEEE CVPR 2024
Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang. ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers. https://arxiv.org/abs/2305.15272 . Information Fusion. Volume 103, March 2024. https://authors.elsevier.com/c/1i0Qu5a7-GtUgg
Weiyuan Li, Bin Dai, Ziyi Zhou, Qi Yao, Baoyuan Wang. Controlling Character Motions without Observable Driving Source. https://arxiv.org/abs/2308.06025 . IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024, JAN 4-8 IN WAIKOLOA, HAWAII
Chaoda Zheng, Xu Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang Cui, Zhen Li. An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds. https://arxiv.org/abs/2303.12535. Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence 2023.
Xiaohang Ren, Xingyu Chen, Pengfei Yao, Heung-Yeung Shum, Baoyuan Wang. Reinforced Disentanglement for Face Swapping without Skip Connection. https://arxiv.org/pdf/2307.07928.pdf IEEEE ICCV 2023, Paris, France, Oct 2023.
Xingyu Chen, Yu Deng, Baoyuan Wang. Mimic3D: Thriving 3D-Aware GANs via 3D-to-2D Imitation. https://seanchenxy.github.io/Mimic3DWeb/ IEEE ICCV 2023, Paris, France, Oct 2023.
Zhentao Yu, Zixin Yin, Deyu Zhou, Duomin Wang, Finn Wong, Baoyuan Wang. Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors. IEEE ICCV 2023, Paris, France, Oct 2023. https://zxyin.github.io/TH-PAD/
Anran Lin, Nanxuan Zhao, Shuliang Ning, Yuda Qiu, Baoyuan Wang, Xiaoguang Han. FashionTex: Controllable Virtual Try-on with Text and Texture. Accepted to SIGGRAPH 2023 (Conference Proceedings).
Yu Deng, Baoyuan Wang, Heung-Yeung Shum. Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistant Portrait Synthesis from Monocular Image. https://yudeng.github.io/GRAMInverter/ IEEE CVPR 2023.
Xingyu Chen, Baoyuan Wang, Heung-Yeung Shum. Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video. https://seanchenxy.github.io/HandAvatarWeb/ IEEE CVPR 2023.
Duoming Wang, Yu Deng, Zixin Yin, Heung-Yeung Shum, Baoyuan Wang. Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis IEEE CVPR 2023. https://arxiv.org/abs/2212.04248 code & model: https://github.com/Dorniwang/PD-FGC-inference
Zixiang Zhou, Baoyuan Wang. UDE: A Unified Driving Engine for Human Motion Generation https://openreview.net/pdf?id=LRXiCtA3zO IEEE CVPR 2023.
Wenbin Zhu, Chien-Yi Chen, Kuan-Luan Zeng, Shang-Hong Lai, Baoyuan Wang. Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation.、 IEEE CVPR 2022. https://arxiv.org/abs/2203.14327
Chaoda Zheng, Yan Xu, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Zhen Li, Shuguang Cui. Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracing in Point Clouds. IEEE CVPR 2022, Oral. https://arxiv.org/abs/2203.01730
Chenqian Yan, Yuge Zhang, Quanlu Zhang, Yaming Yang, Xinyang Jiang, Yuqing Yang, Baoyuan Wang. Privacy-preserving Online AutoML for Domain-Specific Face Detection. IEEE CVPR 2022. https://arxiv.org/pdf/2203.08399.pdf
Noranart Vesdapunt, Baoyuan Wang. CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement. IEEE CVPR 2021. https://arxiv.org/abs/2103.07017
Noranart Vesdapunt, Mitch Rundle, Muscle Wu, Baoyuan Wang. JNR:Joint-based Neural Rig Representation for Compact 3D Face Modeling. IEEE ECCV'2020, AUGUST, UK. https://arxiv.org/abs/2007.06755
Bindita Chaudhuri, Noranart Vesdapunt, Linda Shapiro, Baoyuan Wang. Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting. IEEE ECCV'2020, Spotlight , AUGUST, UK. https://homes.cs.washington.edu/~bindita/personalizedfacemodeling.html
Wenbin Zhu, Muscle Wu, Zeyu Chen, Noranart Vesdapunt, Baoyuan Wang. ReDA: Reinforced Differentiable Attributes for 3D Face Reconstruction. IEEE CVPR 2020, Oral, Seattle, WA. https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhu_ReDAReinforced_Differentiable_Attribute_for_3D_Face_Reconstruction_CVPR_2020_paper.pdf
Gaurav Mittal, Baoyuan Wang. Animating Faces with Disentangled Audio Representation Learning. IEEE WACV 2020. Oral https://openaccess.thecvf.com/content_WACV_2020/papers/Mittal_Animating_Face_using_Disentangled_Audio_Representations_WACV_2020_paper.pdf
Publications( Before 2020):
Baoyuan Wang, Noranart Vesdapunt, Utkarsh Sinha, Lei Zhang. Real-time Burst Photo Selection Using a Light-Head Adversarial Net. https://arxiv.org/abs/1803.07212, 2019.11.20, IEEE Transactions on Image Processing (TIP 2019)
BINDITA CHAUDHURI, Noranart Vesdapunt, Baoyuan Wang. Joint Face Detection and Facial Motion Retargeting for Multiple Faces . IEEE CVPR 2019. https://arxiv.org/pdf/1902.10744.pdf 2019.2.26.
Huan Yang, Baoyuan Wang, Noranart Vesdapunt, Minyi Guo, Sing Bing Kang. Personalized Exposure Control Using Adaptive Metering and Reinforcement Learning. https://arxiv.org/abs/1803.02269 IEEE Transactions on Visualization and Computer Graphics ( Volume: 25 , Issue: 10 , Oct. 1 2019 )
Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin. Exposure: A White-Box Photo Post-Processing Framework, https://arxiv.org/pdf/1709.09602.pdf, ACM Transaction on Graphics, Presented at ACM SIGGRAPH 2018
Bin Dai, Baoyuan Wang, Gang Hua. Understanding and Predicting The Attractiveness of Human Action Shot. https://arxiv.org/abs/1711.00677, 2017. 11
TaeHyun Oh,Kyungdon Joo, Neel Josh, Baoyuan Wang, Sing Bing Kang. Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning. IEEE ICCV'2017, Venice, Italy.
Yuanming Hu, Baoyuan Wang, Steve Lin. FC^4:Fully Convolutional Color Constancy with Confidence-weighted Pooling. IEEE CVPR' 2017, Oral, Honolulu, Hawaii.
Bo Xin, Yizhou Wang, Wen Gao, and Baoyuan Wang, David Wipf. Maximal Sparsity with Deep Networks? NIPS 2016.
Huan Yang, Baoyuan Wang, Steve Lin, David Wipf, Baining Guo. Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders. IEEE ICCV'15, Santiago, Chile. [ Paper ] [BibTex]
Ruobing Wu, Baoyuan Wang, Yizhou Yu. Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification. IEEE ICCV'15, Santiago, Chile.
Jun Zhu, Baoyuan Wang, Zhuowen Tu. "Action Recognition with Acton". IEEE ICCV 2013, Sydney, Australia.
Yuwang Wang, Baoyuan Wang, Yizhou Yu, Zhuowen Tu. "Action Gons: Action Recognition with A Discriminative Dictionary of Structured Elements with Varying Granularity ", ACCV'14, Singapore.
Xinggang Wang, Baoyuan Wang, Xiang Bai, Zhuowen Tu. "Max-Margin Multiple-Instance Dictionary Learning". ICML 2013, Atlanta, USA.
Wangjiang Zhu, Baoyuan Wang, Steve Lin. Adaptive Polling on Multiple Trajectory Attributes for Action Recognition. The 12th IEEE International conference on Advanced Video and Signal-based Surveillance (IEEE AVSS’15). Oral.
Weijia Zou, Baoyuan Wang, Rui Zhang. "Human Activity Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature". IEEE PCM'13, Oral.
Min Tan, Baoyuan Wang, Gang Pan. Robust Object Recognition via Weakly Supervised Metric and Template Learning. Accepted by Neurocomputing, May, 2015.
Min Tan, Baoyuan Wang, Zhaohui Wu, Jingdong Wang, Gang Pan. Weakly Supervised Metric Learning for Traffic Sign Recognition in a LIDAR Equipped Vehicle. IEEE Transaction on Intelligent Transportation Systems. 2015.
Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu. Automatic Photo Adjustment Using Neural Networks. ACM Transaction on Graphics. Presented at SIGGRAPH 2016.
Baoyuan Wang, Yizhou Yu, Ying-Qing Xu. "Example-based Image Color and Tone Style Enhancement". ACM SIGGRAPH 2011.
Baoyuan Wang, Yizhou Yu, Tien-Tsin Wong, Chun Chen, Ying-Qing Xu. "Data-Driven Image Color Theme Enhancement". ACM SIGGRAPH Asia 2010, Seoul, December 2010 (ACM Transactions on Graphics, Vol. 29, No. 5, 2010). [ More ]
Baoyuan Wang, Yizhou Yu "Parallel H-Tree Based Data Cubing on Graphics Processors", International Journal of Software and Informatics (IJSI), 2012
Baoyuan Wang, Gang Chen, Jiajun Bu, Yizhou Yu. "Multiscale Visualization of Relational Databases Using Layered Zoom trees and Partial Data Cubes". International Conference On Information Visualization Theory and Applications (IEEE IVAPP), Angers, France, May 2010 (Oral). [ More ]
Baoyuan Wang, Gang Chen, Jiajun Bu, Yizhou Yu. "Zoomtree: Unrestricted Zooming Path in Multi-scale Visual Analysis of Relational Databases". Computer Vision, Imaging and Computer Graphics. Theory and Applications. Pages 299-317. 2011.
US Patent
https://patents.justia.com/inventor/baoyuan-wang