article specifically talks about this. deepseek spending significant test time with worse results than klm