[ad_1]
Today we are releasing two updated, production-ready Gemini models: Gemini-1.5-Pro-002 And Gemini-1.5-Flash-002 together with:
- >50% discounted price for 1.5 Pro (both input and output for prompts).
- 2x higher rate limits on 1.5 Flash and ~3x higher on 1.5 Pro
- 2x faster output and 3x lower latency
- Updated default filter settings
These new models build on our recent experimental model releases and include significant improvements over the Gemini 1.5 models released at Google I/O in May. Developers can access our latest models for free via Google AI Studio and the Gemini API. For larger organizations and Google Cloud customers, the models are also available on Vertex AI.
Improved overall quality with greater gains in math, long context and vision
The Gemini 1.5 series are models designed for general performance across a wide range of text, code, and multimodal tasks. For example, Gemini models can be used to synthesize information from 1000-page PDFs, answer questions on repos with more than 10,000 lines of code, record hours of videos and create useful content from them, and much more.
With the latest updates, 1.5 Pro and Flash are now better, faster and more cost-effective to produce in production. We see a ~7% increase in MMLU-Pro, a more sophisticated version of the popular MMLU benchmark. On the MATH and HiddenMath benchmarks (an internal set of competitive math problems), both models achieved a significant improvement of approximately 20%. For vision and code use cases, both models also perform better in evaluations measuring visual understanding and Python code generation (in the range of approximately 2-7%).
We've also improved the overall usefulness of the model answers while adhering to our content security policies and standards. This means less poking/rejections and more helpful answers on many topics.
In response to developer feedback, both models now feature a more concise style designed to make these models more user-friendly to use and reduce costs. For use cases such as summarization, question answering, and extraction, the default output length of the updated models is approximately 5-20% shorter than previous models. For chat-based products where users may prefer longer responses by default, you can read our Prompt Strategies guide to learn more about how to make models more verbose and conversational.
For more information about migrating to the latest versions of Gemini 1.5 Pro and 1.5 Flash, see the Gemini API models page.
Gemini 1.5 Pro
We continue to be blown away by the creative and useful applications of Gemini 1.5 Pro's 2 million token contextual window and multimodal capabilities. From video understanding to processing 1000-page PDFs, so many new use cases still need to be developed. Today we are announcing a 64% price reduction for input tokens, a 52% price reduction for output tokens, and a 64% price reduction for incremental cached tokens for our most powerful model in the 1.5 series, Gemini 1.5 Pro, effective January 1. October 2024 requests less than 128,000 tokens. Combined with context caching, this further reduces the cost of building with Gemini.
Increased fare limits
To make it even easier for developers to build with Gemini, we're increasing the rate limits of the paid tiers to 2,000 RPM for 1.5 Flash and 1,000 RPM for 1.5 Pro (from 1,000 and 360, respectively). We expect to continue increasing the rate limits for the Gemini API in the coming weeks to allow developers to build more with Gemini.
2x faster output and 3x less latency
In addition to key improvements to our latest models, in recent weeks we have reduced latency with 1.5 Flash and significantly increased output tokens per second to enable new use cases with our most powerful models.
Updated filter settings
Since Gemini's initial launch in December 2023, the focus has been on building a safe and reliable model. With the latest versions of Gemini (-002 models), we have improved the model's ability to follow user instructions while balancing security. We will continue to offer a range of security filters that developers can apply to Google's models. The models released today do not apply the filters by default, allowing developers to determine the configuration that best suits their use case.
Gemini 1.5 Flash-8B Experimental Updates
We are releasing a further improved version of the Gemini 1.5 model announced in August called “Gemini-1.5-Flash-8B-Exp-0924”. This improved version provides significant performance improvements for both text and multimodal use cases. It is available now via Google AI Studio and the Gemini API.
The overwhelmingly positive feedback developers shared on 1.5 Flash-8B was incredible to see, and we will continue to shape our pipeline from experimental to production releases based on developer feedback.
We're excited about these updates and can't wait to see what you build with the new Gemini models! And for Gemini Advanced users, you will soon be able to access a chat-optimized version of Gemini 1.5 Pro-002.
[ad_2]
Source link