Meta might not be making too many headlines in mainstream media about its AI programs, but it’s quietly releasing some very impressive AI research — and open sourcing it to boot.
Meta has released Massively Multilingual Speech (MMS) project, through which it can turn speech to text, and vice versa, for over 1,100 languages. “Our results show that the Massively Multilingual Speech models outperform existing models and cover 10 times as many languages,” Meta says. Meta has also open-source the project and the dataset. “
Today, we are publicly sharing our models and code so that others in the research community can build upon our work. Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world,” it says.
Meta says it was a challenge to find audio datasets for thousands of languages, many of which are now spoken by only a handful of people around the world. “We turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” Meta explains. “These translations have publicly available audio recordings of people reading these texts in different languages. As part of this project, we created a dataset of readings of the New Testament in over 1,100 languages, which provided on average 32 hours of data per language,” the paper adds.
Meta said that the MIMS model outperforms current models like OpenAI’s Whisper. “In a like-for-like comparison with OpenAI’s Whisper, we found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages. This demonstrates that our model can perform very well compared with the best current speech models,” Meta said.
MMS is only the latest in a series of AI releases from Meta. Meta had previously released and open-sourced the Segment Anything Model (SAM), which allows for users to pick out different in an image. It had then released the DinoV2 model, which was a computer vision model which also worked with videos. Meta had also come up with the LLaMa model, whose weights were later leaked, and which has led to an explosion in the creation of new models such as Stanford’s Alpaca, Vicuna and others. OpenAI and Google might be garnering all the attention for now for their AI moves, but Meta appears to now be decisively throwing its hat into the ring.