Gemini 1.5 Professional Now Obtainable in 180+ Nations; With Native Audio Understanding, System Directions, JSON Mode and Extra



Posted by Jaclyn Konzelmann and Megan Li – Google Labs

Seize an API key in Google AI Studio, and get began with the Gemini API Cookbook

Lower than two months in the past, we made our next-generation Gemini 1.5 Professional mannequin out there in Google AI Studio for builders to check out. We’ve been amazed by what the group has been in a position to debug, create and be taught utilizing our groundbreaking 1 million context window.

In the present day, we’re making Gemini 1.5 Professional out there in 180+ nations through the Gemini API in public preview, with a first-ever native audio (speech) understanding functionality and a brand new File API to make it straightforward to deal with information. We’re additionally launching new options like system directions and JSON mode to provide builders extra management over the mannequin’s output. Lastly, we’re releasing our subsequent technology textual content embedding mannequin that outperforms comparable fashions. Go to Google AI Studio to create or entry your API key, and begin constructing.

Unlock new use circumstances with audio and video modalities

We’re increasing the enter modalities for Gemini 1.5 Professional to incorporate audio (speech) understanding in each the Gemini API and Google AI Studio. Moreover, Gemini 1.5 Professional is now in a position to purpose throughout each picture (frames) and audio (speech) for movies uploaded in Google AI Studio, and we look ahead to including API help for this quickly.

screen grab of a clooege professor using Gemini 1.5 Pro to create a quiz based on their latest lecture video in Google AI Studio
You may add a recording of a lecture, like this 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Professional can flip it right into a quiz with a solution key. [Video sped up for demo purposes]

Gemini API Enhancements

In the present day, we’re addressing a lot of high developer requests:

1. System directions: Information the mannequin’s responses with system directions, now out there in Google AI Studio and the Gemini API. Outline roles, codecs, objectives, and guidelines to steer the mannequin’s habits on your particular use case.

2. JSON mode: Instruct the mannequin to solely output JSON objects. This mode permits structured information extraction from textual content or pictures. You may get began with cURL, and Python SDK help is coming quickly.

3. Enhancements to operate calling: Now you can choose modes to restrict the mannequin’s outputs, enhancing reliability. Select textual content, operate name, or simply the operate itself.

A brand new embedding mannequin with improved efficiency

Beginning at present, builders will have the ability to entry our subsequent technology textual content embedding mannequin through the Gemini API. The brand new mannequin, text-embedding-004, (text-embedding-preview-0409 in Vertex AI), achieves a stronger retrieval efficiency and outperforms present fashions with comparable dimensions, on the MTEB benchmarks.

table showing Gecko: Versativel Text Embeddings Distilled from Large Language Models
‘Textual content-embedding-004’ (aka Gecko) utilizing 256 dims output outperforms all bigger 768 dim output fashions on MTEB benchmarks

These are simply the primary of many enhancements coming to the Gemini API and Google AI Studio within the subsequent few weeks. We’re persevering with to work on making Google AI Studio and the Gemini API the best solution to construct with Gemini. Get began at present in Google AI Studio with Gemini 1.5 Professional, discover code examples and quickstarts in our new Gemini API Cookbook, and be part of our group channel on Discord.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox