Email:  baoyuanw AT yahoo.com

 Interest: Digital Twin, AI Agent, Personal Assistant

Address: Remote, US

Biography:

I recently joined Zoom after nearly 3.5 years at Xiaobing.ai, where I worked closely with Dr. Harry Shum. At Zoom, my team is to build the core technology stacks for AI companion systems. If you're passionate about these domains and interested in joining our dynamic team, I'd love to connect with you.

I began my journey with Xiaobing.ai (spun off from and invested by Microsoft) in early 2021, transitioning from a fulfilling 12-year tenure at Microsoft. At Xiaobing.ai, I led the AI R&D team, focusing on pioneering multimodal representation learning for conversational AI, advancing visual content generation, and exploring novel interaction technologies for avatar agents.

Prior to this, I served as Senior Principal Researcher and Manager at Microsoft HoloLens and the AI Platform team within Microsoft's AI and Cloud division. My research pursuits included computer vision, learning-based computational photography, and AI-driven content generation. Throughout my career, I've been driven by the ambition to tackle industry-scale challenges, with a keen focus on practical machine learning applications. My collaborations with various product teams, such as Bing Maps, Xbox/Kinect, Microsoft Pix Camera, SwiftKey, Windows, and Cognitive Services, have led to the integration of key technologies into these products. Earlier, I was a lead researcher at Microsoft Research Asia from 2012 to 2015.

I earned my Ph.D. in Computer Science from Zhejiang University in 2012 under the guidance of Professor Yizhou Yu, following my B.S. in Software Engineering from the same institution in 2007. My research journey includes a stint as a research intern at the Internet Graphics Group at Microsoft Research Asia from May 2009 to June 2012, collaborating with Ying-Qing Xu, Xin Tong, and Zhuowen Tu. I also had the opportunity to visit Microsoft Research in San Francisco for three months in 2011, working under the mentorship of Li-Yi Wei and Jaron Lanier. Additionally, I gained early professional experience as a developer intern at Infosys Limited in Bangalore, India, from September 2006 to April 2007.

For a detailed overview of my professional path and contributions, feel free to peruse my latest CV

Work Experiences

Education

If you are interested in knowing more about ZJU, please check out https://www.usnews.com/education/best-global-universities/zhejiang-university-504773


Major Consumer Products & Business Solutions

AI Companion: https://www.zoom.com/en/blog/zoom-ai-companion/ Meeting Summarization, next step predictions, multi-turn QAs for meetings/docs, AI model customizations, virtual agents, etc. 

AI Employee: https://business.xiaoice.com/  My team ships the advanced closed domain Question and Answering, and open domain  Persona Chat solutions for AI being digital brain system, using in-house LLM and tailored tech stacks.

Digital Twin in X Eva (China App stores in both Android and iOS now, international version is coming soon):https://island.xiaoice.com/,  technology includes: agent, conversations, video chat, face reenactment, other AIGC features including image generations, TTS, etc

Virtual IPs in Douyin (China TikTok):  https://www.douyin.com/user/MS4wLjABAAAA_FX11UDBw7gopcoMWiGn1b8DgdPv5z4Lh_fN5V-WsuQ technology includes: 3D face synthesis, face swap, TTS, singing, etc.

Xiaoice Islandhttps://island.xiaoice.com/, technology includes: conversations, behavior planning, TTS, Singing, etc.

Xbox/Kinect: I shipped early event prediction, and human gesture recognition system to Xbox One. Check out this video:https://www.youtube.com/watch?v=UP9atMP0aNU

Hololens: I was the tech lead in the human understanding team of HoloLens, my team worked on face 3D reconstruction and tracking, face detection, recognition, and alignments.

Microsoft Pix: https://www.microsoft.com/en-us/microsoftpix?SilentAuth=1&wa=wsignin1.0, I shipped the best burst photo selection, exposure control scene classifier, etc AI models for iPhone devices. Microsoft Pix was  named one of the 50 Best Apps of the Year 2016 by the New York Times

SwiftKey: One of the widely used keyboards in the Android platform, I shipped the 3D Animoji system (through a 3D face tracking algorithm using RGB camera only), check out the report: https://ukstories.microsoft.com/features/panda-cat-dog-owl-or-dinosaur-swiftkey-can-turn-you-into-a-cute-animal-when-you-message-friends/

Microsoft Azure Cognitive Service. My team and I shipped face recognition, detection, and alignment models to Azure cognitive services. https://azure.microsoft.com/en-us/pricing/details/cognitive-services/face-api/


Preprints on Vison/Graphics/NLP/Agent


Publications(2020- now):

Conversational AI/NLP

Vision/Graphics


Publications( Before 2020):

 

US Patent

https://patents.justia.com/inventor/baoyuan-wang