[ad_1]
Research
Exploring AGI, the challenges of scaling, and the future of multimodal generative AI
Next week, the artificial intelligence (AI) community will gather for the 2024 International Conference on Machine Learning (ICML). The conference will take place from July 21 to 27 in Vienna, Austria, and is an international platform to showcase the latest advances, exchange ideas and shape the future of AI research.
This year, Google DeepMind teams will present more than 80 research papers. At our stand we will also showcase our multimodal on-device model Gemini Nano, our new family of AI models for education called LearnLM and demonstrate TacticAI, an AI assistant that can help with football tactics.
Here we present some of our talks, spotlight and poster presentations:
Defining the path to AGI
What is Artificial General Intelligence (AGI)? The term describes an AI system that is at least as capable as a human in most tasks. As AI models evolve, it becomes increasingly important to define what AGI might look like in practice.
We present a framework for classifying the capabilities and behaviors of AGI models. Depending on their performance, generality and autonomy, our article categorizes systems ranging from non-AI computers to new AI models and other novel technologies.
We will also show that openness is critical to building general AI that goes beyond human capabilities. While many recent AI advances have relied on existing internet-scale data, open systems can produce new discoveries that advance human knowledge.
Scale AI systems efficiently and responsibly
Developing larger, more powerful AI models requires more efficient training methods, a closer alignment with human preferences, and better data protection measures.
We show how using classification techniques instead of regression techniques makes it easier to scale deep reinforcement learning systems and achieve state-of-the-art performance in various domains. Furthermore, we propose a novel approach that predicts the distribution of consequences of a reinforcement learning agent's actions, helping to quickly evaluate new scenarios.
Our researchers present an approach to maintaining alignment that reduces the need for human oversight, and a new approach to fine-tuning large language models (LLMs) based on game theory that better tailors the output of an LLM to human preferences.
We criticize the approach of training models on public data and fine-tuning only with “differentially private” training, arguing that this approach may not provide the privacy or utility often claimed.
New approaches in generative AI and multimodality
Generative AI technologies and multimodal capabilities expand the creative possibilities of digital media.
Introducing VideoPoet, which uses an LLM to generate state-of-the-art video and audio data from multimodal inputs such as images, text, audio, and other videos.
And share Genie (generative interactive environments), which can generate a series of playable environments for training AI agents based on text prompts, images, photos or sketches.
Finally, we present MagicLens, a novel image retrieval system that uses text instructions to retrieve images with richer relationships beyond visual similarity.
Supporting the AI community
We are proud to sponsor ICML and foster a diverse AI and machine learning community by supporting initiatives led by Disability in AI, Queer in AI, LatinX in AI and Women in Machine Learning.
If you're attending the conference, visit the Google DeepMind and Google Research booths to meet our teams, watch live demos, and learn more about our research.
[ad_2]
Source link