Back to all posts February 22, 2022

The Avatar Dilemma

Our CEO, Victor Erukhimov, has written a post on using realistic and cartoonish avatars in Metaverse, and we are republishing it here.

A digital world connects us all. Everyday people talk to each other via emails, chats and video instead of meetings and phone calls. Kids are on social networks like Roblox, and multiplayer games. The pandemic served as an acceleration to the trend that was already obvious. Increasingly more of these digital interactions are in 3D. People are used to exist in a three-dimensional space, so we are getting tired of staring at 2D cameras when meeting with people over Zoom. Once there are better VR headsets that are lighter and have higher resolution screens, increasingly more meetings will be in VR. This will allow users to see each other as in real life, use whiteboards, share video screens and documents.

Meeting people in a 3D experience (or the “metaverse” that is becoming a very trendy term these days) requires a digital representation of a person. You can’t get away with a flat video (a few companies tried having 3D models with superimposed video streams for personalization), so a lot of companies are looking for a solution that involves personal avatars.

Playing a game of Fortnight, I may not feel like being recognized. If I am in the VRchat or a similar experience, I may choose to be anonymous on that platform, so VRChat avatars offer the flexibility to choose what kind of character I want to be. Perhaps I am a lawyer, and I want to be a cat in VRChat, I can be. However, if I am in a business meeting with customers, I need to show my face in order to be more trustworthy. This means the closer to reality my avatar is, the better. 10 years from now hardware may be powerful enough to render hyper-realistic avatars in real time, but today it is not available to consumers. So, what we need are realistic enough avatars that our friends or acquaintances can recognize us. This establishes credibility in this new and uncertain reality.

So why instead of realistic avatars, do we see so many cartoonish ones? There are a few challenges in creating realistic avatars. It is hard to create a likeness of a person from one or few photos collected with a mobile phone (Hollywood uses expensive — a few hundred thousand dollars worth — rigs). It is not easy to animate such a model so that it resembles the person. But mostly a lot of creators are cautious about enabling realistic avatars because of the uncanny valley effect, which refers to the mental uneasiness that occurs when an artificial figure tries but fails at mimicking a human.

Uncanny valley is a well known effect that an imperfect 3D model of a person creates an eerie feeling for people. Anyone interested in the details can start from the wiki article https://en.wikipedia.org/wiki/Uncanny_valley. Movie and game industries exploited uncanny valley for ages to create scary characters. Humans can have a strong, visceral aversion to such characters. This is why The Polar Express was so controversial when it was released.

There is an ongoing discussion that the uncanny valley is less applicable to many use cases of realistic avatars. Eduard Zell et al [1] show that the uncanny valley effect may be triggered not by making an avatar more realistic, but by introducing inconsistencies in the stylization/level of detail. Henriette C. Van Vugt et al [2] demonstrated that a recognizable (and not hyper-realistic) avatar does not necessarily fall into the uncanny valley. Katja Zibrek et al [3] discovered that life experience plays a role in experiencing the uncanny valley effect: people with computer gaming experience rated avatars less eerie than an average person. Perhaps we will find that the overall uncanny valley effect will be getting weaker over time, as more people play video games. On top of all that, the uncanny valley effect is often confused with the fact that a lot of people just don’t like how they look in photos. This doesn’t just mean that they don’t want a recognizable avatar: they want to look different in a digital metaverse compared to their real analog life. This is one of the really attractive aspects of virtual reality, the ability to look different, or better for a few hours.

So, there is a choice each creator of a metaverse has: use realistic or cartoonish avatars. Realistic avatars will be recognizable, which brings an emotional connection with the avatar for the person using it and for other people interacting with it. However, these emotions can also be negative because of the uncanny valley. Cartoonish avatars are much easier to design and develop. However, it is quite hard to make universally recognizable cartoonish avatars. And once the avatar is unrecognizable, so that there has to be a name tag over a 3D model’s head, the emotional connection is gone. “Is this my avatar? Meh, whatever”.

My point is – observed in many interactions with customers – that not once I saw the “whatever” attitude towards recognizable avatars. People either love them (“yeah, that’s me, this is crazy!”) or hate them (“horrible”, “cringy”, “embarrassing” etc). The closer you get to a recognizable avatar, the more emotions it evokes in people. Positive emotions improve engagement, while negative drive people away from the platform. Use a realistic avatar of another person, and we are back to “meh, whatever” attitude.

So the real dilemma for every metaverse or an avatar app creator is not about making realistic or cartoonish avatars, it is about recognizable vs unrecognizable avatars. We see two types of 3D experiences that use “whatever” avatars: those where players want to be unrecognized (for instance, some computer games) and those where developers have a “just don’t fuck up” mental state (avoid negative emotion, absence of positive emotion is fine). And there are other types of experiences that absolutely need a recognizable avatar maker: all kinds of work-related meetings, parties and certain types of computer games (for instance, sports games).

Building a “whatever” cartoonish avatar solves many problems: many computer games created beautiful avatars with a very detailed editor that can adjust every tiny little detail of the body and face. Those unfamiliar with this subject can take a look at Sims and Cyberpunk. But how can one build a realistic avatar?

VFX companies need to use a photogrammetry rig to create an avatar of an actor and that is then cleaned up by a staff of 3D artists and added to a film. This is in no way scalable for the billion or so consumers that will be coming into the metaverse space in the next decade. Avatar SDK uses neural networks to create an avatar from a photo of a person and then allows editing the result, as most people love custom avatars. This produces lower fidelity models, as there is a limited amount of information a character creator can get from a single selfie.

But this is not that big of a problem as modern GPUs available in consumer hardware won’t be able to render a high-fidelity model in real-time anyway. Given that GPU power efficiency (FLOPS per Watt) doubles every 3-4 years, it will be some time before mobile devices can render MetaHuman level characters. What we need at this point is an avatar creator that results in recognizable, not hyper-realistic models.

So, how do we move forward from where we are? We can start adding more and more realistic avatars to 3D experiences without falling into the uncanny valley. It is like we are navigating a multidimensional space instead of moving along a single axis. There are ways around the uncanny valley that we can take instead. Finding and navigating these paths is what we at Avatar SDK are excited about. Once we saw our customers falling in love with their realistic avatars, we never looked back.

References