[Newly announced work is bolded below]
Solving Humanity’s Challenges with AI
Flood forecasting
In 2017 we began our flood forecasting initiative to help combat the catastrophic damage from floods each year by equipping those in harm’s way with accurate and detailed alerts. This work is a part of Google’s broader Crisis Response program which provides people access to trusted information and resources in critical moments. For over a decade, our Crisis Response team has been partnering with front line and emergency workers to develop technology and programs that help keep people safe, informed and out of harm’s way.
Floods claim tens of thousands of lives and create billions of dollars in damages around the world every year. Their impacts are particularly severe for individuals in low-resource regions, which often lack effective early warning systems.
Last year, we sent 115 million flood notifications to 23 million people in India and Bangladesh, directing them to flood alerts over Google Search and Maps.
Reliable early warning systems have been shown to significantly reduce fatalities, and up to 30-50% of economic damages.
Previously this work has been live in India and Bangladesh.
We are announcing our expansion into 18 new countries:
[3 new in Latam/Asia] Brazil, Colombia, Sri Lanka
[15 new in Africa] Burkina Faso, Cameroon, Chad, Democratic Republic of Congo, Ivory Coast, Ghana, Guinea, Malawi, Nigeria, Sierra Leone, Angola, South Sudan, Namibia, Liberia, South Africa
We’re also announcing the global launch of Google FloodHub, a new platform that displays when and where floods may occur. We’ll also be bringing this information to Google Search and Maps in the future to help more people to reach safety in flooding situations.
Shows flood forecasts up to 7 days ahead and anyone can register for notifications in a specific location, such as their home, work, or field.
Wildfire tracking
Our machine learning models trained using satellite imagery allow us to identify and track wildfires in real time, and to predict how they will evolve - enabling us to support firefighters and other first responders, as well as keep our users informed about where these events are occurring.
We are announcing that this model is now live in the US, Canada and Mexico, and rolling out in parts of Australia.
AI for Language Inclusivity
We are announcing the 1,000 Languages Initiative, an ambitious research project to build an AI model that will support the 1,000 most spoken languages of the world.
If we want to provide AI-based language technology for the world, we need to make sure our models are also trained on representative content of the world
This is a long-term effort that will focus on three key factors:
Multimodality: As the way people access information shifts to video, speech and more, we’ll use the multimodal capabilities of our advanced models to meet them where they are
As a first step, we've developed a Universal Speech Model – USM – trained on over 400 languages – the largest language coverage seen in a speech model today
The goal of the Universal Speech Model (USM) project is to build a universal speech understanding system that obtains superior performance on a wide range of down-stream speech applications (ASR, AST, and others). With coverage of over 400 languages, USM is the largest coverage speech model to date.
This universal speech understanding model will be a stepping stone towards an eventual universal multimodal model that combines understanding of many modalities across the top 1,000 languages of the world.
Engaging the broader community:
We will work directly with communities across the world to source language data
For example, South Asia we are actively working with local partners to eventually collect representative audio samples across all the region’s dialects and languages. And we are spinning up similar efforts globally.
We plan to publish a research paper in the coming months so the research community can coalesce around this project
Improving our products: As we make progress, we plan to improve our products like YouTube, GBoard, Translate and more – making it easier for people to use technology in their native language and to find relevant content.
Getting Creative with AI [Generative AI research]
Text to Video Models
Earlier this summer, we introduced Imagen and Parti – two different models that both have an impressive ability to generate images from text prompts.
Now, Google has two complementary research approaches that tackle text to video generation.
Imagen Video uses diffusion to generate great looking individual images and yields astonishing short videos.
Phenaki uses a sequence learning technique that generates a series of tokens over time. These yield great long-form videos.
When you combine the two models, you’re getting the best of both worlds: Super-resolution at the frame level and coherence in time.
The Imagen Video and Phenaki teams have been working together to produce AI-generated super-resolution videos, which we are sharing for the first time today. [VISUAL ASSETS]
Note: The assets linked above are all multi-prompt stories created with Phenaki and "super-resed" from 128x128 to 512x512 by Imagen Video.
AI Test Kitchen Season 2
We want to give people a place to learn about, experience, and give feedback on Google’s emerging Generative AI technology.
While the launch of AI Test Kitchen earlier this year showcased the capabilities of LaMDA, we're announcing that Season 2 of AI Test Kitchen will show off text-to-image generation through two demos.
The first demo is called “City Dreamer,” and it uses a combination of LaMDA and our image models to let you quickly build themed cities.
The second demo is called “Wobble,” and it uses LaMDA, our image models, and our latest 2D-to-3D casual animation techniques to create friendly monsters who can dance.
These fun experiences will be coming to the AI Test Kitchen, but you can get the app now from the Play Store or App Store today and start using LaMDA. Note: AI Test Kitchen is only available in US, UK, Kenya, Aus, NZ, Canada, and only in English.
Wordcraft Writers Workshop
Last year, Google introduced LaMDA, a dialog engine able to engage users in conversation on any number of topics.
Wordcraft is a project built on LaMDA that explores text generation for writing.
We teamed up with professional writers who used the Wordcraft editor to create a volume of short stories. These stories are available to read now as part of the Wordcraft Writers Workshop.
AudioLM
AudioLM is a new framework for audio generation that learns to generate realistic speech and music based on only a short audio sample.
This is a pure audio model that is trained without any text or symbolic representation of music.
The result is high-quality audio that’s incredibly realistic and expressive. You can check out a number of samples here.
DreamFusion
DreamFusion is a pre-trained 2D text-to-image diffusion model designed to perform text-to-3D synthesis.
This model generates real 3D figures that can be pulled into 3D software and animated.