Is there a "good enough" endgame for LLMs and AI where benchmarks stop mattering because end users don't notice or care? In such a scenario brand would matter more than the best tech, and OpenAI is way out in front in brand recognition.

For average consumers, I think very much yes, and this is where OpenAI's brand recognition shines.

But for anyone using LLM's to help speed up academic literature reviews where every detail matters, or coding where every detail matters, or anything technical where every detail matters -- the differences very much matter. And benchmarks serve just to confirm your personal experience anyways, as the differences between models becomes extremely apparent when you're working in a niche sub-subfield and one model is showing glaring informational or logical errors and another mostly gets it right.

And then there's a strong possibility that as experts start to say "I always trust <LLM name> more", that halo effect spreads to ordinary consumers who can't tell the difference themselves but want to make sure they use "the best" -- at least for their homework. (For their AI boyfriends and girlfriends, other metrics are probably at play...)

I haven't seen any LLM tech shine "where every detail matters".

In fact so far, they consistently fail in exactly these scenario, glossing over random important details whenever you double check results in depth.

You might have found models, prompts or workflows that work for you though, I'm interested.

> OpenAI's brand recognition shines.

We've seen this movie before. Snapchat was the darling. Infact, it invented the entire category and was dominating the format for years. Then it ran out of time.

Now very few people use Snapchat, and it has been reduced to a footnote in history.

If you think I'm exaggerating, that just proves my point.

Not a great example: Snapchat made it through the slump, successfully captured the next generation of teenagers, and now has around 500M DAUs.

You might not remember, but Snapchat was once supposed to take on Facebook. The founder was so cocky that they declined being bought by Facebook because they thought they could be bigger.

I never said Snapchat is dead. It still lives on, but it is a shell of the past. They had no moat, and the competitors caught up (Instagram, Whatsapp and even LinkedIn copied Snapchat with stories .. and rest is history)

[deleted]

Google biggest advantage over time will be costs. They have their own hardware which they can and will optimise for their LLMS. And Google has experience of getting market share over time by giving better results, performance or space. ie gmail vs hotmail/yahoo. Chrome vs IE/Firefox. So don't discount them if the quality is better they will get ahead over time.

It already is costs. Their Pro plan has much more generous limits compared to both OpenAI and especially Anthropic. You get 20 Deep Research queries with Pro per day, for example.

That might be true for a narrow definition of chatbots, but they aren't going to survive on name recognition if their models are inferior in the medium term. Right now, "agents" are only really useful for coding, but when they start to be adopted for more mainstream tasks, people will migrate to the tools that actually work first.

this. I don't know any non-tech people who use anything other than chatgpt. On a similar note, I've wondered why Amazon doesn't make a chatgpt-like app with their latest Alexa+ makeover, seems like a missed opportunity. The Alexa app has a feature to talk to the LLM in chat mode, but the overall app is geared towards managing devices.

Google has great distribution to be able to just put Gemini in front of people who are already using their many other popular services. ChatGPT definitely came out of the gate with a big lead on name recognition, but I have been surprised to hear various non-techy friends talking about using Gemini recently, I think for many of them just because they have access at work through their Workspace accounts.

Most of Europe if full of Gemini ads, my parents use Gemini because it is free and it popped up in YouTube ad before the video

Just go outside the bubble plus take a bit older people

Yeah my parents never really cared enough to explore ChatGPT despite hearing about it 10 times a day in news/media for the last few years. But recently my mom started using Google's AI Search mode after first trying it while doing research for house hunting and my dad uses the Gemini app for occasional questions/identifying parts and stuff (he has always loved Google Lens so those sort of interactive multimedia features are the main pull vs plain text chatbot conversations).

They are both Android/Google Search users so all it really took was "sure I guess I'll try that" in response to a nudge from Google. For me personally I have subscriptions to Claude/ChatGPT/Gemini for coding but use Gemini for 90% of chatbot questions. Eventually I'll cancel some of them but will probably keep Gemini regardless because I like having the extra storage with my Google One plan bundle. Google having a pre-existing platform/ecosystem is a huge advantage imo.

[dead]

I doubt anyone I know who is using llms outside of work knows that there are benchmark tests for these models.

This is why both google and microsoft are pushing Gemini and Copilot in everyone's face.