My bad; I should have been more precise: "ai" in this case is "LLMs for coding".
If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.
And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?
> nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking '
nowhere is it clearly specified that the free model IS as capable as the paid one either. so if you have uncertainty if IS/IS-NOT as capable, what sort of scientist assumes the answer IS?
> nowhere is it clearly specified that the free model IS as capable as the paid one either. so if you have uncertainty if IS/IS-NOT as capable, what sort of scientist assumes the answer IS?
Putting the same model name/number on both the free and paid versions is the specification that performance will be the same. If a scientist has to bring to bear his science background to interpret and evaluate product markings, then society has a problem. Any reasonable person expects products with the same labels to perform similarly.
Perhaps this is why Divisions/Bureaus of Weights and Measures are widespread at the state and county levels. I wonder if a person that brings a complaint to one of these agencies or a consumer protection agency to fix this situation wouldn't be doing society a huge service.
They don't have the same labels though. On the free ChatGPT you can't select thinking mode.
> On the free ChatGPT you can't select thinking mode.
This is true, but thinking mode shows up based on the questions asked, and some other unknown criteria. In the cases I cited, the responses were in thinking mode.