About Me - Experiences
About Me - Experiences
Simon Alexanderson is a researcher at KTH and Co-founder and CTO of Motorica AI. His interests are in machine learning and animation where his aim is to create novel ways of animating and directing expressive 3D characters. His enjoys collaborating with artists, actors and directors, and has been innovating in the fields of theatre and film production. In his latest work, he has trained AI models to learn to dance to music and gesticulate to speech. His work has frequently been presented at top conferences such as SIGGRAPH and Euriographics.
As the Co-founder and CTO of Motorica AI, most of Alexanderson's work revolves around it, as does this talk.
I had a few connectivity issues, and arrived a few minutes late into the presentation. He was discussing a recent addition to AI generation called ControlNet.
With ControlNet, you provide Stable Diffusion with a 'map'. This allows you to still get a variety of different outcomes, but that are all constrained to a certain pose/layout.
These can be done with images that is analyzed with computer vision to determine the layout of the image, or it can also be done from rough sketches. This looks like a really interesting addition that I'll want to check out more later after this session.
This specific example, he talks about how you can provide it an image, like of a person, and from there the pose can be automatically determined. From here, it can use this exact pose in what it generates.
So how does this relate to animation and motion?
This becomes difficult for a variety of reasons. One factor of this is the amount of data. These AI generative models were trained on huge amounts of data, but we don't have the same amount of data to train models of movement.
So, they have worked on collecting their own data using dancers and mocap (motion capture).
After training, models can be combined with different music with synthesized results. This can have the style of dance match the music, or not.
This same idea was also used with gestures, based on a dataset from Ubisoft, ZeroEGGS Dataset - Ghorbani et al. 2022. With this, they explored how different emotions may affect the gestures.
This concept was also used for different styles of movement, including walking and running. From there, they also tried to modify how that movement might look, based on personalities, expressions, and even non-human creatures. They have 160 different styles they've worked with for locomotion.
The training data for this is also coming from mocap. As the training data is coming directly from actors, the quality of your actors matters.
From there, he did a demo of their online tool, Motorica AI. It is currently in an open beta stage. It is free to sign up. Not specified if the free status is limited to the Beta, or if there will be a pricing model later. He also mentions that these can also work with Maya. Right now, there is not a limited creative control for the animator, but that is something they want to improve in the future.
Some of the footage from this presentation is also in a teaser the released at the end of 2022. There are also some other examples in the teaser that he alluded to, but were not included in the talk.
A Q&A session filled out the remaining time.
Ethically speaking, what materials were used for the learning models?
Is the 20-40 hours of mocap data paid for to teach the models. Internet scraping is not part of their model. Their focus is on recording their own, or collaborating with others and sharing any related profits.
How does this incorporate the concept of noise?
Noise data is fed to the figure joints.
If this is the infancy of this technology, what does it look like when it's more developed?
He thinks of this idea, not as an autonomous agent or something like chatGPT, but rather something more like a puppet. This technology is like a cyborg, a mix of human and computer working together, not the computer replacing the animator.
What sort of clients are you working with mostly?
We are mostly seeing interest from VFX studios and game companies. These are mostly smaller indie groups who don't have the budgets, manpower, or skills to do this type of work themselves.
How does this type of tech affect the job market?
Alexanderson hopes this technology helps democratize high-end asset creation and empowers indie studios and small teams. He does not see this as a threat to key-frame animation. Availability of hand-made animation is limited, so this shouldn't really threaten it. Mocap is also very high end and expensive, so he doesn't foresee this as a threat to them, as those interested in the product dont have the resources for mocap.
Can this be utilized for 2D rig animation?
No.
Can mocap performers utilize their own data with this system?
Not yet, but they do intend to explore that as an option where studios could input their own data to have movement in the specific style they want.
Has this been done with animals?
No, but they are in discussion with some partners that may allow us to create mocap data with dogs, horses, and other animals.
What does the longterm look like related to pricing?
They are looking at differing price points similar to Midjourney. They've also received a large grant from Unreal/Epic Games to create a plugin to do this directly in the Unreal Engine. They are interested in creating plugins for other softwares as well.
There was also a demo video released on their YouTube channel last year, it covers some of the same content, but also shows some of the tool in more detail. With it being older, aspects of the UI and/or features may be different now. The demo also shows some inclusion with Unity. As its Unity and not Unreal, I'd guess this was made before they received the grant from Unreal/Epic Games.