"Our models are more honest" honey the quarterly marketing spin for a ML term has come. Forget "task alignment" now we're going for "truth index". I suppose this is the only way to generate hype when you're selling/releasing the same product over and over again.
When doing some electrical, Opus 4.7 essentially told me to wiggle a wire to see if it was hot or not with my bare hand.
I called it out.
It then gave me one of the most super heartfelt honest and sincere apologies I have ever received.
Glad the safety team was there for me and able to make such an honest model or I would have been very upset about it.
Opus is so bad at electrical work it's really disappointing. And when it tries to draw schematics as SVGs it's a complete disaster. They should either focus on training their LLMs on this task specifically, or have it refuse.
Hmm, what kind of electrical work? I had it "watch over my shoulder" as I swapped out the pressure switch on our home well and it was a big help. And in the run up to that when I explained opening the 220 box and checking that was "above my paygrade" it limited our investigation to just the less sparky parts.
I mean introductory circuit stuff. Not electrician-lite work.
I honestly cannot tell if you are being sarcastic or not
It did try and lead me to touch a live hot wire once. Thanking the safety team for the honest and sincere apology it gave after was sarcasm.
It tried to get you touch a live wire, then you called it honest and thanked the safety team. It really comes off as sarcastic.
Credit where it is due, Claude is fantastic at pointing out potential flaws in how I understand the problem based on my question. I asked for this in the system instructions but it is the first model I've tried that does it regularly. It is also so tactful, I feel like I'm learning social skills from a language model. Half of the time it is a false positive due to insufficient context but I still appreciate the additional check.
Gave me wrong information on my very first question. Wasn’t even complicated, and I wasn’t trying to trick it.