Tried with one of my older projects to test it out. The problem statement was to upgrade the Java spring boot version from 2.7 to latest. I have done it in the past so exactly knew what had to be done there. All the requirement docs, design specs, and tasks look verbose and ok on high level. Interestingly I had some idea where the language in those docs was coming from. 1. It created about 40+ tasks (I have my own curated list of 5 tasks based on past experience)
2. At every task it tried to compile the code but failed for dependency errors
3. It still marked the task being complete and passed the onus of failures on the downstream tasks
4. Kept moving with the tasks where the original error were still not fixed but the tasks were being marked as done
5. After some point of time I got tired to a degree that I stopped reading the exact commands being executed, the fatigue of doing something that you are not involved in is for real 6. I made a naive assumption that I can sandbox it by giving permissions to the project folder only. It executed some CLI commands for java that looked simple enough in the beginning.
7. Turns out my environment variables got messed up and other simple things related to git, gradle stopped working
Ended my experiment, reverted the code changes, fixed my environment
Key takeaways:
1. Its giving a sense of work being executed, the quality and concreteness of work is hard to measure unless you have already done that in past. Its creating classes, tests which are not needed instead of focussing on the actual use case.
2. Sandboxes are MUST, there is a real risk of corruption, environment commands are not just simple file changes which could be easily reverted.
Interesting. How large is the project?
It was relatively pretty small project, about 5ish controllers and same number of service classes. The experiment lasted for about 2.5 hours where I was active for the first 45 minutes and then just pressing the buttons to move next in passive mode.