Microsoft’s Vall-E could help you create audio-based deep fakes

Microsoft’s Vall-E could help you create audio-based deep fakes
HIGHLIGHTS

Microsoft has revealed its latest AI Tool - Vall-E, a TTS software that lets you input text that’s then turned into audio

Vall-E has been classified as a a neural codec language model that has been trained on over 60,000 hours of English speech

It is not free to use or play around with at the moment

There’s no doubt about the fact that we’re all headed towards a world run by AI at an alarmingly rapid pace. ChatGPT has already opened our eyes to what machine learning can do (its capabilities were described as an early glimpse of AI by the maker), while Dall-E showed us we don’t really need humans to make art anymore. Now, Microsoft has revealed its own AI tool, Vall-E, which can mimic the sound of your voice after hearing just a 3-second clip of you talking. 

Artificial Intelligence

What is Microsoft’s Vall-E?

Vall-E is essentially a text-to-speed (TTS) system that lets you input a script of text that it then turns into audio. In the past, such software has always generated audio that either sounds incredibly robotic or costs an arm and a leg for “human voices”. Vall-E, a neural codec language model, has been trained using 60,000 hours of English speed and produces results that are as close to a human talking as possible. Microsoft has claimed that its AI tool can “significantly outperform” other TTS tools in the market. 

What actually makes it stand out isn’t its ability to sound like you. It’s the ability to capture emotion in speech, which is what makes it sound like someone is actually talking. 

Microsoft

Using Microsoft’s Vall-E

At this point, Microsoft has not created a free-to-use version the way OpenAI did with ChatGPT. They have, however, posted a bunch of samples on their website, showing the range of results you can get with their tool.

Of course, while the tool can be used to help the mute speak, it can also be used to create really good deep fakes and audios of known personalities. Between this, Chat GPT, and Dall-E, we’ll soon be living in a world where we won’t be able to distinguish between content created by humans and machines.

Kajoli Anand Puri

Kajoli Anand Puri

Kajoli is a tech-enthusiast with a soft-spot for smart kitchen and home appliances. She loves exploring gadgets and gizmos that are designed to make life simpler, but also secretly fears a world run by AI. Oh wait, we’re already there. View Full Profile

Digit.in
Logo
Digit.in
Logo