What to know
- Google has launched Gemini 2.5 Flash, a new AI model available in preview for developers.
- Developers can set a 'thinking budget'—a cap on the number of tokens the model uses for reasoning, adjustable up to 24,576 tokens.
- The model balances speed, cost, and reasoning quality, automatically adjusting its thinking depth based on task complexity if no budget is set.
- Gemini 2.5 Flash is accessible via the Gemini API, Google AI Studio, and Vertex AI, and is designed for low-latency, high-volume tasks.
Google has introduced Gemini 2.5 Flash, the latest addition to its Gemini AI family, now available in preview for developers. This model stands out by offering a unique feature called the 'thinking budget,' which allows you to control the maximum number of tokens the model can use for internal reasoning.
You can set this budget anywhere from 0 to 24,576 tokens, giving you fine-grained control over the trade-offs between response quality, speed, and cost.
If you set the thinking budget to zero, Gemini 2.5 Flash prioritizes speed and cost, behaving like a faster version of its predecessor, 2.0 Flash. When you increase the budget, the model spends more time reasoning, which can improve the quality of answers for complex, multi-step tasks.
However, the model is designed to be efficient—it only uses as much of the budget as needed for the complexity of your prompt. For simple queries, it responds quickly without unnecessary processing.
This flexibility is especially useful for developers who need to optimize for different use cases. For example, straightforward questions or translations require minimal reasoning, while more challenging problems, such as advanced math or programming tasks, benefit from a higher thinking budget.
The model automatically adjusts its reasoning depth if you do not specify a budget, evaluating each request for complexity.
Gemini 2.5 Flash is accessible through the Gemini API, Google AI Studio, and Vertex AI.
It supports text, image, video, and audio prompts, and features a one-million-token context window. Google highlights that this model delivers strong performance on difficult prompts, ranking just behind the more advanced 2.5 Pro model.
The introduction of the thinking budget makes Gemini 2.5 Flash one of the most cost-efficient and adaptable AI models currently available for developers.
Via: 9to5Google
Discussion