yeah but it also made some tests pass by changing the tests. i’m not super familiar so i’ll dig more on weekend but it seems sus pending more review. i’ve had ai do similar things that i caught in manual review. cheating the test is bad.
yeah but it also made some tests pass by changing the tests. i’m not super familiar so i’ll dig more on weekend but it seems sus pending more review. i’ve had ai do similar things that i caught in manual review. cheating the test is bad.
It is welk known that agents can cheat or go off on tangents and not recover. Just recently deleted a bunch of code files that I didn't ask for. The code wasn't even used anywhere.
That's why they've merged it into canary so they can continue working on it.
[dead]