On Monday, Google DeepMind – the division responsible for developing general-purpose artificial intelligence (AGI) technology – announced ‘Genie’, a new generative AI model that can create playable virtual environments from just a single image, drawing, or text prompt.
Google Genie Can Generate Playable 2D Platformer Games From Text And Images
Genie – short for Generative Interactive Environments – was developed by Google in collaboration with the University of British Columbia. Google Genie, as it’s called, can generate side-scrolling 2D platformer games, like Super Mario Brothers and Contra, by understanding a text prompt, sketch, or idea that users can interact with and play.
During the announcement, Good DeepMind acknowledged the emergence of generative AI models capable of generating “novel and creative content via language, images, and even videos”, before adding that it is now introducing a “new paradigm” for generative AI in the form of Genie.
Google says Genie is trained on 200,000 hours of unsupervised public internet gaming videos, with a huge percentage of these games being 2D platformers rather than full virtual reality games.
Genie Can Recognize The Main Character In A Game On Its Own
Google researchers say Genie is powered by a latent action model that can infer the actions between video frames, a video tokenizer that converts raw video frames into discrete tokens, and a dynamic model that determines what the next frame will be.
A unique feature of Genie’s foundational model is that it can recognize the main character within a game without ever being trained on action or text annotations. This allows the user to easily control the character in a virtual reality world generated by the AI, all thanks to the models powering it.
Also Read: Google Suspends Gemini AI’s Ability To Generate Images Of Humans After Controversy
Researchers Say Genie Is A Positive Step Towards General World Models For AGI
Google developer Tim Rocktaschel shared DeepMind’s research paper on X, where he noted that Genie can convert any image into a playable 2D world, and can be prompted to generate a variety of action-controllable virtual worlds from a variety of inputs.
Rocktaschel said that while Genie is proficient at creating a 2D world from text or images, the model can perform other tasks, such as teaching other AI models or “agents” about 3D worlds.
He also said the team trained a Genie on robotics data without actions to demonstrate that it can learn action controllable simulator in that area as well. Rocktaschel believes that this is a promising step towards general world models for the AGI.
Artificial general intelligence, also known as the singularity, refers to an AI program that can understand and apply knowledge gained across a wide range of tasks much like a human being.
Deepmind said the dataset produced by Genie was generated by filtering publicly available internet videos, specifically those that included titles like “speedrun” or “playthrough” while excluding words such as “movie” or “unboxing”.
Researchers said that when selecting keywords, the team took the effort to manually spot-check results to verify whether 2D platformer gameplay videos produced by Genie were not “outnumbered” by other videos that happened to share the same keywords.
Google Making Up For Gemini’s Flaws With Genie Announcement
With the launch of Genie, Google is trying to make up for its flaws with Gemini, the company’s previously released generative AI chatbot. Since Gemini was introduced by Google CEO Sundar Pichai back in December, users have discovered several flaws in the large language model (LLM).
Just last week, the company had to temporarily suspend Gemini’s text-to-image generation feature after users complained the model was depicting historical events with racially inclusive imagery.