Title
Real-time motion capture with P5js - building vtuber(virtual Youtuber) from scratch
Sketch link
https://editor.p5js.org/Alicelong/sketches/W9kI3FSgS
One sentence description
My sketch is able to capture the user's real-time facial motion with the camera and draw virtual avatar in 2D anime-style on screen accordingly.
Project summary (250-500 words)
My project aims to develop a 2D virtual avatar that tracks and performs users' real-time facial motion. When the user blinks eyes, the virtual avatar can do the blinking action. When the user move head, the avatar move accordingly. To track user's facial movement, my project employs the MediaPipe Facemesh Model which is capable of predicting 486 3D facial landmarks in real time. Based on the map provided by MediaPipe, I can find which landmark point is corresponded to which part of the face. For the scope of the project, I mainly developed three facial features of the avatar that could perform according to real-time user's motion: opening and closing of eyes, movement of mouth, movement of eyebrow. The avatar can perform steadily regardless of the distance between user and the camera because I employ a scaling value in every step of calculation. The whole avatar head can move and tilt in X and Y axis but can't turn in Z axis because the avatar is drawn with 2D image. To accomplish turning may require implement of 3D models.
Inspiration
I have long been interested in Vtuber, especially how the real-time capture work and how it is turned into the animated character with smooth and vivid. I know that there are many sophisticated software and complete procedures to develop real time motion capture virtual avatars from scratch. For example, to develop 2D virtual avatar, one can follow the procedure of first drawing the 2D character in photoshop, animating it in Live2D, and performing real time capture in Facerig.
Live 2D is a very popular tool to animate 2D character illustration. The output would be a JSON model that can later be used for real time motion capture. www.live2d.com/en/
FaceRig developed by Holotech Studios use image based face tracking technique to embody user's real-time facial movement with the model imported. The output can be recorded as a movie or streamed to Skype, Twitch, Hangouts or any service that traditionally uses a webcam in real time.
However, both anime 2D and Facerig require subscription fee; so do most of the face tracking and animation software. Moreover, the development process is complicated and across multiple softwares. Users needs to spend a lot of time learning various software in order to develop their own character, which is unfriendly to the user new to the field of vtuber. Therefore, I want to try develop virtual avatar on P5js, a free open source website. I wanted to combine the animation and real-time motion capture together in p5 so that user don't have to go through multiple platforms. Because of the scope of the project, I would not have time to develop a user interface to allow user to import their own character image. But having user interface that allows user to import their own avatar would be my future vision for this project.
Process
Initially, I planned to train multiple regression models with ml5.neural network and facemesh keypoints, Each model controlling a facial feature change(eye opening/closing, mouth opening/closing, eyebrow). For example, for the model controlling eyes opening/closing, I would first collect data of me closing eyes gradually while sliding the slider according to how much I close. Then I would train the model and import the finished model to main sketch.
But Dan suggested that it would be much faster to represent motion by calculating the relative location change of certain facial keypoints. For instance, eyes closing can be detected by calculating the change of distance between keypoints of upper eyebar and lower eyebar. So I went toward this direction instead of training multiple models.
I built my project upon the p5 sketch provided by Jeff Thompson,a the demonstration of Mediapipe FaceMesh. Link to his work: editor.p5js.org/jeffThompson/sketches/FNocqjOib
His sketch provided two very useful functions for my implementation. The first one is the scalePoint method, which takes in the raw array of predicted location of the keypoints(x,y,z) and converts points from video coordinates to canvas.
The second one is scaling. The distance between lefttop.x and rightbottom.x is measured as the scaling unit. This scaling unit would be applied to every drawing of facial features so that the relative distance between facial features would be the same regardless of user's being close or far away from the screen.
To measure the openness of eyes, I calculate the distance between the highest and lowest point of left and right eye rims (the face.scaleMesh[159], face.scaleMesh[145], face.scaleMesh[386], face.scaleMesh[374]) with dist()function. After smoothing the distance with lerp function and multiplying it with scale unit, I map the adjusted result with the size of the avatar eyeball. The other part of the eye is the eyelash, right above the eyeball. During the process of closing eyes, while the size of the eyeball get smaller, the location of eyelash move down accordingly to match with the shrinking eyeball. So I also match the y location of eyelash with the changing ratio.
The drawing of eyebrow followed similar way. Initially I picked the keypoints of eyebrow and calculate the distance between eyebrow and eyeIris but soon I found out that when I tilt my head, the eyebrow keypoints can not effectively locate my eyebrow. So I picked keypoints of middle forehead(face.scaledMesh[69], face.scaledMesh[299]) that are right above the eyes. Forehead keypoints changed more distinctly when I raised my eyebrow and the amount of change stay the same when I tilt my head. I then calculate the distances between forehead keypoints, scale them and map them with the distance between eyebrow and eyes of the avatar. When I raised my eyebrow, the distance between forehead and eye increase, so the distance between avatar's eyebrow and avatar's eye increase accordingly.
Drawing avatar mouth is difficult because the movement of lip is very irregular. In Thompson's sketch, he uses function beginShape(), endShape(CLOSE), and silouette keypoint group to draw the shape of face and mouth. Besides scaleMesh points featuring single landmark, the result from Facemesh model also include an annotation category in which scaleMesh points are grouped based what facial features they represented. I applied similar method, using lipsUpperInner group and lipsLowerInner group to draw the shape of avatar mouth.
During user testing, I found a significant problem: The avatar facial feature would not rotate even users tilted their head. This was very troubling.
I discussed this problem with Jung and Dan. The final solution is to calculate how much head is tilted and rotate facial feature with the same angle. I created two vector: the x location and y location of bottom of the face(face.scaledMesh[152]) as well as x and y of the top of the face (face.scaledMesh[10]). Then I used p5.Vector.sub to get the angle between these two vectors. The angle was then applied to the rotation of all facial features.
The final step is to add hair, which is drawn at the center of the face(face.scaledMesh[5]). Scaled unit and rotation angle was also applied to hair.
Audience:
My project is for those who are new to Vtuber field, who don't want to learn avatar production and real time capture softwares or pay for their services , who just want to try having a virtual avatar
SourceCode
See https://editor.p5js.org/Alicelong/sketches/W9kI3FSgS
CodeReference
https://editor.p5js.org/jeffThompson/sketches/FNocqjOib Jeff Thompson's example code of faceMesh provided solid template for loading model, getting the results and scaling the result points. His idea of drawing the shape of mouth with beginShape() and endShape(CLOSE) is very inspiring too.
Next Steps
All of the avatar's piece are 2D images. The drawing of avatar works well when user faced right in front of the screen. Tilting works well too after I added the rotation. But when the users turned their head, the program would fail. When face turned to side, or up, or down, there was a significant change of shape of all facial feature. It was really hard for 2D avatar to follow such changed. So my future vision of my program is that the program would stop recognizing when user's face is turned to a certain degree(right, left, up, down). Recognizing turning can be done with mathematic calculation of distance too but it would be very limited since user can turn to any direction. So my plan is train a model that collect data of user facing straight, and user turning to all different directions, all different angles. (Maybe Categorization model training in p5 would work. Facing straight is the first category, all other situation is the second). The drawing would stop at the last stage when the model recognized that user have turned their faces and restart drawing when user turned back. The main concern for this part is if loading two models together would works.