I don't understand how businesses could trust cloud LLMs going forward with this ongoing "safety" paranoia. Building dependence on them doesn't feel like a sane strategic decision for users.

Looking better and better for people to go after local solutions.

Tell that to the GPU market

I think it heard. A 128GB strix halo was $1400 at launch. Now they’re $3299.

That 7 months of claude -> 16.5 months of claude.

idk I just bought a 7900 XTX for $750 on ebay and it runs gemma and qwen pretty well

It isn’t about trust or no trust, it’s about having a capability to do stuff vs not having it. If Fable is the only model doing the right thing in your use case, your only choice is to use it or not. If the efficiency gain is 2x, it’s a hit you can probably take. If it’s 100x you pay up and shut up.

Of course you can trust them.

Just do benchmarks yourself on the new model and decide if it is valuable for your usecase, even with the supposed nerfing.

Benchmarks are benchmarks. And you can ignore the data at your own risk.

Problem that corruption is silent and service can be degraded at any moment or well, randomly.

Because this effectively hinders 0% of people. I understand why people don't like it but day to day this is nothing. If you're using it for coding, it won't stop you. The pearl clenching here and over reacting is predictable and sad. If you are working for a large organization and you were going through the vendor procurement process, questions like Can this produce pornography? Can this tell my employees how to break the law? are normal and anyone wiht half a brain knows that this is the case. Before people jump on that, I understand people have access to the internet. Your question "how businesses could trust cloud LLMs going forward" is absurd and you know it. There is an extremely small set of edge cases that effect 0% of people day to day. You can trust them just fine.

My very first prompt to Fable, which was a completely benign math problem, hit one of their visible triggers. Many tokens into the problem, frustratingly. The user experience (read peer comments) is that you run into these issues with high frequency.

I guess, given that, a pro tip would be to err toward sequential work rather than giving monster prompts. That constraint has got to degrade quality though.

This is software development, not sales. We rely on our tooling.

If I’m using a calculator to verify my math, I don’t want to use a second calculator to verify the first one.

I am sorry to be the one to tell you but it was already the case that you cannot trust LLMs to solve all your problems 100% of the time.

It was always random. This is no different than any other randomness that already exists in LLMS.

If you are concerned just do benchmarks and see if it is valuable for your usecase regardless.

It's not paranoia. Cyber attacks have gone up massively in the past few months even with the weaker models we had so far. And Claude Mythos 5 scores even higher than the unreleased Mythos Preview on ExploitBench. If you made this capability publicly available you would see another acceleration of cyber attacks.

This isn't even about cyber attacks. This is just LLM development which is increasingly just called software development. And at least for cyber it says "Sorry I can't help with that"!