Language Bank

Did you know a language dies every two weeks?

Image of earth and languages using AI

About Language Bank

In our interconnected digital age, one winning Microsoft Hackathon team, Language Bank, emerged as a leader in a vital mission: the preservation of the world’s languages. This team, winners of the 2022 Hack for Society challenge, saw that amidst the 7,000 languages that enrich our planet, a staggering 40% are on the brink of vanishing. Enter Microsoft’s Language Bank, a pioneering platform that employs artificial intelligence to protect these precious linguistic assets, with a special focus on the underrepresented tongues.

“Languages are critical for preserving the traditions, values and identity of a community.” Qinying Liao, Microsoft Program Manager and lead on the Language Bank team.

The team, comprised of 10 Microsoft employees in Beijing and Suzhou, China, were no strangers to language and translation software. In their day-to-day work, the team builds the world’s leading text-to-speech (TTS) for first- and third-party customers all over the world, reaching tens of millions of people on Azure AI Speech Services.

The Language Bank team set out to accomplish two goals: remove obstacles in time spent and total cost with text-to-speech software and preserve languages.  

“So, on one hand supporting text to speech is my daily job, but we do see some challenges. It costs time and money to expand the service to a new language especially to those less spoken — meaning there is little data that we could collect to train a model,” said Liao. 

The mechanism is as follows: a speaker provides their voice data in their language to train AI to speak their language. With low-resource speech synthesis technology, the Language Bank can generate an AI voice in any new language. In this project, the team is working on creating text-to-speech (TTS) voices for under-resourced languages, such as Inuktitut, Kurmanji, Hakka, and Minnan Chinese. They have collected speech data from various sources, including government agencies, social impact companies, and customers, and have built and evaluated TTS voices for some of the languages. They plan to release the Inuktitut language in Azure AI Speech products and deploy it to customer solutions soon and continue to collect data and refine the voices for the other languages.   

“The team has innovated the TTS technology to support low-resourced languages, so we were able to largely reduce the training data required to extend TTS to a new language,” said Liao. 

Language Bank is more than a mere archive. It’s a vibrant, growing community where contributors around the globe enhance the platform’s TTS features. By donating linguistic data, both written and spoken, users help to breathe life into AI voices for their languages, like for the languages of Inuktitut and Inuinnaqtun.

It was a historic moment for the Inuit people of Canada when Microsoft and the Government of Nunavut announced the support of adding Inuktitut and Inuinnaqtun into Microsoft Translator in January 2021. These languages, spoken by over 40,000 people in the northern regions of Canada, were now accessible to millions of users around the world through the power of machine translation. Microsoft was one of the first companies to support these languages for end users, showing its commitment to preserving and promoting linguistic diversity. 

And the story does not end there. Microsoft wanted to go further and enable text to speech capability for these languages, so that the Inuit could hear their words spoken by a synthetic voice. This would open new possibilities for education, communication, and entertainment. To achieve this, Microsoft needed high-quality audio data from native speakers of Inuktitut and Inuinnaqtun. Luckily, the Government of Nunavut had a deep connection with different organizations in Canada who had been working on a language bank project, collecting and digitizing recordings of the Inuit languages. They agreed to share their data with Microsoft and license it for the next step: building a text to speech model for Azure AI Speech service. 

The collaboration between Microsoft, the Government of Nunavut, and the Inuit organizations was a success. After months of hard work, the text to speech model was ready to be launched. The Inuit people could now hear their languages spoken by a computer, with natural and expressive intonation. They could also use the text to speech feature to learn new words, listen to stories, and create their own content. The voice of the Inuit was now heard loud and clear, thanks to the power of technology and the spirit of partnership. 

The significance of safeguarding lesser-known languages cannot be overstated, especially for the evolution of AI. A diverse linguistic database diminishes biases and fosters inclusivity, enabling AI to serve a broader spectrum of humanity. This aligns with Microsoft’s mission of empowering every person and every organization on the planet to achieve more.  Looking ahead, Language Bank’s journey continues with the refinement of prototypes, the validation of new language models using customer data, and the scaling of operations to embrace an increasing array of languages on the platform. The team is further seeking to cultivate a robust ecosystem that not only conserves languages but also inspires a culture of technological empowerment and innovation. 

Language Bank stands as a symbol of commitment to both technological progress and cultural stewardship. It’s an invitation to join a global movement that values every voice and ensures every language finds its rightful place in our digital realm. 

“During my guidance and participation in this project, I always felt a passion for using technology to create more value for society. I believe this is the charm of the Microsoft Global Hackathon.” – Yi Qiu, Director, Microsoft Garage Redmond. Qiu acted as a key support leader after the team’s sponsorship coaching concluded and still supports the team today. 

Language Bank and many other projects with global impact have roots in the Microsoft Global Hackathon, produced by The Garage. The Hackathon is more than just an event; it’s a movement. It encourages employees worldwide to collaborate and innovate, breaking down silos and building a stronger, more creative Microsoft. The Hackathon reflects Microsoft’s belief in the power of collective ingenuity, where diverse perspectives come together to solve complex challenges. 

The Garage at Microsoft drives a culture of innovation for all roles and skill levels. It offers workshops, talks, hackathons, and project coaching and drives AI readiness across the company. The Garage has global locations including the Bay Area, Vancouver, New England, New York City, Dublin, Hyderabad, and more.
 

Learn more about how Microsoft employees in The Garage are changing our world every day by bookmarking our site and following The Garage on X and Instagram. 

 

Journey

The Microsoft Language Bank, an initiative born from the 2022 Microsoft Hackathon, addresses the urgent need to preserve the world’s languages, 40% of which are endangered. Spearheaded by a team of ten in Beijing and Suzhou, China, the Language Bank employs AI to develop text-to-speech (TTS) technology for underrepresented languages like Inuktitut, Kurmanji, Hakka, Minnan Chinese and more. By collecting voice data and innovating TTS technology, they reduce the time and cost of supporting new languages, enabling AI to serve a more diverse population.

Team

Image of Language Bank Team

Qinying Liao, Gang Wang, Garfield He, Junwei Gan, Lihui Wang, Jinzhu Liu, Yanqing Lu, Tianhua Zhao, Binggong Ding, Sheng Zhao