Meta’s new AI models can recognize and produce speech for more than 1,000 languages

Meta has constructed AI models that can recognize and produce speech for more than 1,000 languages—a tenfold improve on what’s at present accessible. It’s a major step towards preserving languages which can be susceptible to disappearing, the corporate says.

Meta is releasing its models to the general public through the code internet hosting service GitHub. It claims that making them open supply will assist builders working in numerous languages to construct new speech functions—like messaging companies that perceive everybody, or virtual-reality programs that can be utilized in any language.

There are round 7,000 languages on the planet, however current speech recognition models cowl solely about 100 of them comprehensively. This is as a result of these sorts of models are likely to require enormous quantities of labeled coaching knowledge, which is offered for solely a small variety of languages, together with English, Spanish, and Chinese.

Meta researchers bought round this drawback by retraining an current AI mannequin developed by the corporate in 2020 that is ready to study speech patterns from audio with out requiring giant quantities of labeled knowledge, comparable to transcripts.

They skilled it on two new knowledge units: one which comprises audio recordings of the New Testament Bible and its corresponding textual content taken from the web in 1,107 languages, and one other containing unlabeled New Testament audio recordings in 3,809 languages. The group processed the speech audio and the textual content knowledge to enhance its high quality earlier than operating an algorithm designed to align audio recordings with accompanying textual content. They then repeated this course of with a second algorithm skilled on the newly aligned knowledge. With this technique, the researchers had been in a position to educate the algorithm to study a new language more simply, even with out the accompanying textual content.

“We can use what that model learned to then quickly build speech systems with very, very little data,” says Michael Auli, a analysis scientist at Meta who labored on the challenge.

“For English, we have lots and lots of good data sets, and we have that for a few more languages, but we just don’t have that for languages that are spoken by, say, 1,000 people.”

The researchers say their models can converse in over 1,000 languages however recognize more than 4,000.

They in contrast the models with these from rival firms, together with OpenAI Whisper, and declare theirs had half the error price, regardless of overlaying 11 instances more languages.

However, the group warns the mannequin continues to be susceptible to mistranscribing sure phrases or phrases, which may end in inaccurate or doubtlessly offensive labels. They additionally acknowledge that their speech recognition models yielded more biased phrases than different models, albeit solely 0.7% more.

While the scope of the analysis is spectacular, using non secular texts to coach AI models can be controversial, says Chris Emezue, a researcher at Masakhane, a company engaged on natural-language processing for African languages, who was not concerned within the challenge.

“The Bible has a lot of bias and misrepresentations,” he says.

…. to be continued
Read the Original Article
Copyright for syndicated content material belongs to the linked Source : Technology Review – https://www.technologyreview.com/2023/05/22/1073471/metas-new-ai-models-can-recognize-and-produce-speech-for-more-than-1000-languages/