For many of us, we use our phones to chat, call, and doomscroll through social media — just enough to say we’re using them well enough. That same pattern often shows up in how we approach AI tools; we learn and use just enough features to get by, while ignoring everything else they can do.

However, ignoring those extra features doesn’t make them any less useful. I realized this when I discovered that Gemini could analyze my audio and respond to almost every question I had. Since then, I’ve decided that I’m no longer going to use any of my devices or tools at a surface level.

Google Gemini AI app icon.
OS
Android
Developer
Google
Price model
Subscription

Google Gemini is an AI assistant that can understand and generate text, images, code, and more. It’s designed to help people find information, solve problems, and create things more easily.

How I found out Gemini can hear things

I threw music and voice recordings at Gemini just to see if it would fail

Some time ago, I had Für Elise playing in the background when I became curious about the patterns in the music. I opened Gemini Live and, half-expecting nothing, asked if it could explain what was happening with the pattern changes, especially after the first minute. It could. It walked me through the main theme and contrasting sections quite clearly.

That response pushed me to go further. I uploaded an MP3 file of the piece and asked Gemini to listen to it and identify the instruments used in each section. The analysis suggested that it had processed the request successfully, and it responded that the piece was performed on a single instrument. It then broke the composition into sections: the A section from the beginning to the one-minute mark, the B section from one minute to 1:53, and the C section from 1:54 to the end. It explained that the A section sits in the mid-to-high register of the piano; the B section introduces staccato and rapid scales in the higher keys, alongside a light rhythmic figure in the lower register; and the C section shifts into deep, repeated bass notes beneath a more tense melody in the right hand.

I still wanted to be sure it was actually listening rather than repeating widely available information, so I tried something else. I uploaded an MP4 file of a voice recording from an interview I conducted with a young girl as part of a study on her school’s reading club. When I asked Gemini to summarize the conversation, it did so accurately.

At that point, I realized that Gemini’s audio analysis capabilities are not entirely new. The feature is just under-discussed. Tools like Google AI Studio and NotebookLM have supported audio interactions for a while, and even on YouTube, the Ask button beneath some videos is essentially Gemini offering to engage with what you’re watching in real time. You can also upload music files and ask it to extract lyrics, identify genres, or describe mood and tone.

Real-life use cases of Gemini’s audio analysis

From YouTube summaries to workouts and recipes

For learning, Gemini's audio (and even video) analysis is really useful. You can point it to a long YouTube video by pasting the URL into the prompt box and get a concise summary, instead of sitting through an hour of content to find the five minutes that are relevant to you. When I wanted to create a pattern for making baby booties, I took the URLs of two highly recommended videos and asked Gemini to outline the requirements, processes, and tips. I ended up trying both methods and choosing the one I preferred with Gemini’s responses.

It becomes even more structured when you use NotebookLM to build a dedicated notebook around a set of videos and then ask questions directly. In that setup, responses tend to stay closer to the source material, reducing the likelihood of hallucinated answers. You can also generate quizzes from the same content, which makes it a practical study tool for everything from online courses to conference talks.

Google NotebookLM Logo
OS
Android, iOS, Web-based app
Developer
Google
Pricing model
Free

NotebookLM is Google’s AI-powered research notebook that reads what you upload and helps you transform it into structured summaries, explanations, and visuals.

If you’re on a fitness journey or developing any other skill, you can upload your own recordings and get feedback on your form, whether that’s your weightlifting technique or basketball drills. Of course, it doesn’t replace a coach, but it gives you something immediate and accessible, even late at night when your coach isn’t available.

For professional work, the use cases are enormous. Competitive research becomes more efficient when you can extract strategies and key talking points directly from a competitor’s video content. If you create content yourself, you can also turn video scripts into summaries or infographics without manually transcribing everything first.

Even in the kitchen, Gemini is pretty useful. It can extract a full recipe from a cooking video, even when there’s no voiceover or captions (just someone moving through the steps). With Gemini Live’s real-time video processing, you can also point your camera at something unfamiliar or broken and ask how it works or how to fix it.

Gemini Live with video streaming on a Google Pixel 9 Pro XL.
4 practical ways to use Gemini's camera mode around the house

If you aren't using multimodal Gemini Live, you're missing out on a smarter way to troubleshoot household problems.

Don’t leave this on the table

Gemini’s audio analysis may not be its most talked-about feature, but it’s one you can return to repeatedly because of how useful it is. You can apply it to schoolwork, extract recipes, review a pitch, or analyze a workout or any other kind of video without having to go through everything manually.

It has become a core part of my workflow, especially when I'm following YouTube tutorials and need to move quickly without missing important details. If you’ve not tried it yet, it’s worth exploring, if only to save yourself time and make better use of the tools you already have.