HomeTech NewsFirst impressions of Gemini Live: It’s an improvement over Siri, but still falls short of expectations.

First impressions of Gemini Live: It’s an improvement over Siri, but still falls short of expectations.

Google introduced Gemini Live at its Made by Google event on Tuesday, a feature that enables semi-natural spoken conversations with an AI chatbot powered by Google’s latest large language model. TechCrunch had the opportunity to try it out firsthand.

Gemini Live is Google’s response to OpenAI’s Advanced Voice Mode, a similar feature in ChatGPT currently in limited alpha testing. Although OpenAI showcased the feature first, Google was the first to release the finalized version.

In my experience, these low-latency, voice-based features feel much more natural than texting with ChatGPT or even speaking with Siri or Alexa. Gemini Live responded to questions in under two seconds and adapted relatively quickly when interrupted. While not perfect, it’s the best hands-free way to use your phone that I’ve encountered.

How Gemini Live works:
Before interacting with Gemini Live, users can select from 10 different voices, compared to just three options from OpenAI. Google collaborated with voice actors to create these voices, and I appreciated the variety, finding each voice to be highly lifelike.

In one demonstration, a Google product manager asked Gemini Live to find family-friendly wineries near Mountain View with outdoor areas and nearby playgrounds for kids. This is a more complex request than I’d typically make to Siri or even Google Search, but Gemini successfully recommended Cooper-Garrod Vineyards in Saratoga, meeting the criteria.

However, Gemini Live has its flaws. For instance, it incorrectly identified a nearby playground called Henry Elementary School Playground as being “10 minutes away” from the vineyard. The nearest school by that name is actually over two hours away. There is a Henry Ford Elementary School in Redwood City, but it’s 30 minutes away.

Google highlighted how users can interrupt Gemini Live mid-sentence, allowing the AI to quickly adjust. In practice, this feature isn’t flawless. Sometimes, Google’s project managers and Gemini Live spoke over each other, and the AI didn’t always catch what was said.

Interestingly, Gemini Live is restricted from singing or mimicking voices beyond the 10 provided, likely to avoid copyright issues. Additionally, Google isn’t prioritizing emotional intonation in the user’s voice—something OpenAI emphasized in its demo.

Overall, Gemini Live appears to be a more natural way to explore topics than using a simple Google Search. Google describes Gemini Live as a step towards Project Astra, the fully multimodal AI model introduced at Google I/O. For now, Gemini Live is limited to voice conversations, but in the future, Google aims to incorporate real-time video understanding.

Share: