Monetizing and protecting an AI-powered virtual identity in today’s world


Missed a session from the Future of Work Summit? Visit our Future of Work Summit on-demand library to stream.

This article was contributed by Taesu Kim, CEO of Neosapience.

There is an AI revolution underway in content creation. Speech technologies in particular have made huge strides in recent years. While this could lead to a host of new content experiences, not to mention drastically reduced costs associated with content development and localization, there are plenty of concerns about what the future holds.

Imagine being known for your distinctive voice and relying on it for your livelihood — actors like James Earl Jones, Christopher Walken, Samuel L. Jackson, Fran Drescher, and Kathleen Turner, or musicians like Adele, Billie Eilish, Snoop Dogg, or Harry Styles. If a machine were trained to replicate them, would they lose all artistic control? Would they suddenly provide voice-overs for a YouTube channel in Russia? And would they practically miss out on potential royalties? What about the person looking for a break, or maybe just a way to earn some extra cash by digitally licensing their voice or likeness?

A voice is more than a compilation of sounds

There’s something very exciting that happens when you can type a string of words, click a button and hear your favorite superstar read them aloud, sounding like a real human being with natural rises and falls in their speech, changes in pitch and intonation. This isn’t something robotic, as we’ve grown accustomed to with characters created with AI. Instead, the character you build comes to life with all its layered dimensions.

This depth is what was previously lacking in virtual actors and virtual identities; the experience was, frankly, disappointing. But modern AI-based speech technology can reveal the construction of an identity whose intricate features are revealed through the sound of a voice. The same could be true for AI-based video actors who use movements, gestures and facial expressions identical to those of humans, offering the nuances inherent in a creature without which characters fall flat.

As technology advances to the point that it can gain true knowledge of each of the features of one’s superficial identity – such as their appearance, sounds, mannerisms, taps, and anything else that determines what you see and hear from another, with the exception of their thoughts and feelings – that identity becomes an actor that can be deployed not only by big studios in big budget movies or album releases. Anyone can select that virtual actor using a service like Typecast and put him to work. The key here is that it’s an actor, and even novice actors get paid.

Understandably, there is some fear about how such likenesses could be co-opted and used without a license, permission or payment. I would like to compare this to the problems we have seen as each new medium came on the scene. For example, digital music and video content once thought to deprive artists and studios of revenue have become thriving businesses and new lenders indispensable to today’s bottom line. Solutions were developed that led to the advancement of technology, and the same is true again.

Preserving your digital and virtual identity

Every human voice – and every face too – has its own unique footprint, made up of tens of thousands of characteristics. This makes it very, very difficult to replicate. In a world of deep forgeries, misrepresentations and identity theft, a number of technologies can be deployed to prevent the misuse of AI speech synthesis or video synthesis.

Searching voice identity or speaker is an example. Researchers and data scientists can identify and break down the characteristics of a specific speaker’s voice. In doing so, they can determine whose unique voice was used in a video or audio clip, or whether it was a combination of many voices mixed together and converted using text-to-speech technology. Ultimately, such identifiers could be applied in a Shazam-like app. With this technology, AI-powered voice and video companies can detect if their text-to-speech technology has been abused. Content can then be flagged and removed. Think of it as a new type of copyright control system. Companies like YouTube and Facebook are already developing such technologies for music and video clips, and it won’t be long before they become the norm.

Deep fake detection is another area where: important research is being fed. Technology is being developed to distinguish whether a face in a video is a real person or a face that has been digitally manipulated. For example, an research group has created a system based on a convolutional neural network (CNN) to pull functions at the frame-by-frame level. It can then compare them and train a recurrent neural network (RNN) to classify videos that have been digitally manipulated — and it can do so quickly and on a large scale.

Some people may feel uncomfortable with these solutions as many are still in the works, but let’s put these fears to rest. Detection technologies are proactively created, with an eye to future needs. In the meantime, we need to consider where we are now and synthesized audio and video must be very advanced to clone and deceive.

An AI system designed to produce speech and/or video can only learn from a clean data set. Today, this means that it can pretty much only come from filming or recording in a studio. It is remarkably difficult to have data recorded in a professional studio without the consent of the data subject; studios are not willing to risk a lawsuit. In contrast, data crawled on YouTube or other sites produces such a noisy data set that it can only produce low-quality audio or video, making it easy to spot and remove illegal content. This automatically subtracts the suspects most likely to manipulate and abuse digital and virtual identities. While it will eventually be possible to create high-quality audio and video from noisy data sets, detection technologies will be ready well in advance and provide adequate protection.

Virtual AI actors are still part of a nascent space, but one that is rapidly accelerating. New revenue streams and content development opportunities will continue to drive virtual characters. This, in turn, will provide enough motivation to adopt advanced detection and a new breed of digital rights management tools to control the use of AI-powered virtual identities.

Taesu Kim is the CEO of Neosapience.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about cutting edge ideas and up-to-date information, best practices and the future of data and data technology, join DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

Leave a Reply

Your email address will not be published.