Key Takeaways:
At the Google I/O 2024, Google’s annual developer conference, Dennis Hassabis, the head of Google DeepMind, showed off an early version of the tech giant’s formidable multi-modal AI – Project Astra.
Project Astra Can See and Understand the World Around You
The AI assistant is Google’s answer to OpenAI’s GPT-4o, which was unveiled on Monday. The company claims that its multimodal large language model (LLM) can see the world, know what things are and where you left them, and answer a wide range of questions.
During the conference, Hassabis showed the crowd a demo video where an Astra user at Google’s London headquarters asked it to identify what part of a speaker it was looking at and how it functions, find their missing spectacles, and review code. Astra did the job in real-time while being in constant conversation with the user.
Most impressively, the model even managed to identify the part of London that the person was in by simply reading the outside environment through the window.
Google’s new multimodal AI agent can pull information from both, the web, as well as the physical world it sees around the user, through the lens of their smartphone camera.
Also, there are other interesting use cases for Astra, such as finding your car in a large parking space. In that case, all you need to do is tell or show Astra where the car is and the system will guide you back to the exact spot.
It can view snippets of a line of code and tell you what it is for, or frame a poem depending on how you feel at that moment.
For the time being, Project Astra is still in the early stages of testing, and no specific release dates have been announced yet. Google did hint at the I/O 2024 that some of the AI model’s capabilities will be integrated into existing or upcoming products that are launching later this year.
Google I/O 2024 Was All About Gemini
Astra is just one of many Gemini announcements that Google made at its developer conference this year.
There is a new model called Gemini 1.5 Flash, which is designed to be faster at performing common tasks like text summarization and captioning. A generative AI video model called Veo can generate “high quality” 1080p resolution videos of over a minute in length in a wide range of visual and cinematic styles from a text prompt. Gemini Nano, an AI that can be used locally on smaller devices like smartphones, and the super-enhanced Gemini Pro that when given a query can consider 2 million possibilities before answering.
A lot of the products announced at the Google I/O were about offering users a more easier and faster way to leverage Gemini. Google released a new product called Gemini Live, which is a voice assistant that lets you have easy back-and-forth conversations with the AI, interrupting it if it gets long-winded or calling it back to earlier parts of the conversation.
Gemini-powered features are also found on Google Lens, which now has a feature that allows users to search the web by shooting and narrating a video.
Hassabis claims that this is all possible because of Gemini’s large context window, giving it access to a huge amount of information at once. The DeepMind CEO says this ability is crucial for the user to feel normal and natural when interacting with an AI.
Google Glass is Making an AI-powered Comeback
Moreover, Google is already working with Samsung and Qualcomm to develop camera-enabled AI glasses. The Silicon Valley giant recognizes that AI is missing a link that can bridge it to extended reality (XR) technologies like augmented reality (AR), virtual reality (VR), and mixed reality (MR).
It’s been 10 years since Google released Google Glass, a device that was far ahead of its time. Could it be making a big AI-powered comeback?
More News: Google Is Bringing Gemini Nano AI To Chrome