[ad_1]
Research
We continue to develop adaptive AI agents, enable 3D scene creation, and innovate LLM training for a smarter, safer future
Next week, AI researchers worldwide will gather for the 38th Annual Neural Information Processing Systems Conference (NeurIPS), taking place December 10-15 in Vancouver.
Two papers led by Google DeepMind researchers win Test of Time awards for their “undeniable influence” in the field. Ilya Sutskever will give a talk on “Sequence-to-Sequence Learning with Neural Networks,” co-authored with Google DeepMind VP of Drastic Research, Oriol Vinyals, and distinguished scientist Quoc V. Le. Google DeepMind scientists Ian Goodfellow and David Warde-Farley will give a talk on Generative Adversarial Nets.
We also show how we translate our fundamental research into real-world applications, with live demonstrations such as Gemma Scope, AI for music generation, weather forecasting and more.
Google DeepMind teams will present more than 100 new contributions on topics ranging from AI agents and generative media to innovative approaches to learning.
Building adaptive, intelligent and safe AI agents
LLM-based AI agents are showing promise in executing digital tasks via natural language commands. However, their success depends on accurately interacting with complex user interfaces that require extensive training data. With AndroidControl, we're sharing the most diverse control data set to date, with over 15,000 human-collected demos across more than 800 apps. AI agents trained using this dataset demonstrated significant performance improvements, which we hope will advance research into more general AI agents.
For AI agents to generalize across tasks, they must learn from every experience they have. We present a contextual abstraction learning method that helps agents capture important task patterns and relationships from incomplete demos and natural language feedback, improving their performance and adaptability.
Developing agent AI that aims to meet users' goals can help make the technology more useful. However, when developing an AI that acts on our behalf, alignment is critical. To this end, we propose a theoretical method for measuring the targeting of an AI system and also show how a model's perception of its user can influence its security filters. Taken together, these findings highlight the importance of robust safeguards to prevent unintended or unsafe behavior and ensure that AI agents' actions continue to be directed toward safe, intended uses.
Further development of the creation and simulation of 3D scenes
As demand for high-quality 3D content grows in industries such as gaming and visual effects, creating lifelike 3D scenes remains costly and time-consuming. Our recent work introduces novel 3D generation, simulation and control approaches that streamline content creation for faster and more flexible workflows.
Creating high-quality, realistic 3D assets and scenes often requires capturing and modeling thousands of 2D photos. Introducing CAT3D, a system that can create 3D content from any number of images – even just an image or a text prompt – in just a minute. CAT3D achieves this with a multi-view diffusion model that generates additional consistent 2D images from many different viewpoints and uses these generated images as input to traditional 3D modeling techniques. The results exceed previous methods in both speed and quality.
The simulation of scenes with many rigid objects, such as a cluttered table top or falling Lego bricks, also remains computationally intensive. To overcome this obstacle, we introduce a new technique called SDF-Sim that represents object shapes in a scalable manner, accelerating collision detection and enabling efficient simulation of large, complex scenes.
AI image generators based on diffusion models struggle to control the 3D position and orientation of multiple objects. Our Neural Assets solution introduces object-specific representations that capture both appearance and 3D pose, learned through training on dynamic video data. Neural Assets allows users to move, rotate, or swap objects across scenes – a useful tool for animation, gaming, and virtual reality.
Improving the way LLMs learn and respond
We are also improving the way LLMs train, learn and respond to users, improving performance and efficiency on multiple fronts.
With larger windows of context, LLMs can now learn from potentially thousands of examples simultaneously – known as Many-Shot-In-Context Learning (ICL). This process increases model performance on tasks such as mathematics, translation, and reasoning, but often requires high-quality, human-generated data. To make training more cost-effective, we are exploring methods for adapting many-shot ICL that reduce the reliance on manually curated data. There is so much data available for training language models that the biggest limitation for teams building them is the available computing power. We address an important question: Given a fixed computational budget, how do you choose the right model size to achieve the best results?
Another innovative approach, which we call Time-Reversed Language Models (TRLM), explores pre-training and fine-tuning an LLM to work in reverse. When a TRLM receives traditional LLM answers as input, it generates queries that may have led to those answers. Combined with a traditional LLM, this method not only helps answers better follow user instructions, but also improves citation generation for summarized texts and improves security filters against malicious content.
Curating high-quality data is critical for training large AI models, but manual curation is difficult at scale. To address this issue, our Joint Example Selection (JEST) algorithm optimizes training by identifying the most learnable data in larger batches. This enables up to 13 times fewer training rounds and 10 times fewer calculations, exceeding state-of-the-art multimodal pre-training baselines.
Planning tasks present another challenge for AI, particularly in stochastic environments where outcomes are influenced by randomness or uncertainty. Researchers use different types of inference for planning, but there is no one-size-fits-all approach. We show that planning itself can be viewed as a special type of probabilistic inference and propose a framework for ranking different inference techniques based on their planning effectiveness.
Bringing together the global AI community
We are proud to be a Diamond Sponsor of the conference and support Women in Machine Learning, LatinX in AI and Black in AI in building communities around the world working in AI, machine learning and data science.
If you're at NeurIPs this year, stop by the Google DeepMind and Google Research booths to learn about the latest research in demos, workshops, and more during the conference.
[ad_2]
Source link