Hacker News

It’s crucial to use for driving/walking.

One problem has been ChatGpt/Claude apps don’t really do this well. They use weak and/or non-reasoning models for voice interaction and the UX is not optimized for hands free.

I wrote an iOS chatbot app mainly for this purpose for myself and family/friends. Allows starting/sending voice prompts with the action button so I never have to look at the screen. Supports any model at any reasoning level so conversations are not dumbed down. Added a video transcription tool so any model can “read” YouTube/Tiktok videos and chat about them. Great to discuss lectures on tech topics.

It takes slightly longer to use a reasoning model for voice interaction use but I prefer the intelligence. The latency can be minimized a few ways, bidirectional streaming helps. It’s TTS agnostic, I’ve got a few selectable providers and the output can be prompt styled “use a chill tone that’s not too eager”.

3 hours ago [ - ]

[deleted]

jorvi 4 hours ago [ - ]

I mean, even applied voice 'models' suck for this.

For some godawful reason, Apple Maps voice directions assume that you also understand what it omits. So if it says "turn right in 500 meters" "250 meters" and then you stop at an intersection after 150 meters and it says "turn right", it expects you to understand that it doesn't mean the immediate right at the intersection, but the next one [because you still haven't driven the full 250m]. It is nuts and I have no clue how that has ever gotten past testing.

What it should do is say nothing until I have to turn, or say "turn right in 100 meters" "turn right".

Melatonic 3 hours ago [ - ]

This is one thing Waze I think seems to do better than the competition. And they have a ton of different voices.

They also clearly show which voices can do street names (which is hugely helpful). For some reason the Australian and British accented voices feel more polite than the Americans

gbalduzzi 5 hours ago [ - ]

What are the use cases of an LLM while walking or driving, that also require high reasoning?

shostack an hour ago [ - ]

With a sufficiently sophisticated harness you can actually do quite a lot by just talking to your AI. I have regularly dictated to build things on my phone while walking to lunch for example.

WhitneyLand 3 hours ago [ - ]

Most of the problem is that for voice chat, you usually get no reasoning at all and no tool use at all to research or ground assumptions.

For example for voice ChatGPT still uses a quantized gpt40 non-reasoning model that hallucinates pretty frequently. It also doesn’t do much automatic search for updated information and fact checking.

I usually don’t find I need high, usually DeepSeek v4 with medium reasoning is sufficient.

However if it’s important chat like brainstorming on complex topics I sometimes bump it up.

OpenAI has a new voice api that supports adjustable reasoning, but ChatGpt is not using it currently.

WarmWash 6 hours ago [ - ]

Gemini 3.1 flash live is a native audio to audio model with reasoning. But it's still not a SOTA level model