That doesn't seem a super useful test for a model that's optimized for programming?