TA here. Biggest changes are in the second assignment (distributed) where we added a bunch of memory, profiling and distributed tasks, as well as in the fifth assignment (alignment), where most of the RL tasks are fresh this year. Assignment 3 (scaling laws) was also completely updated, but in a way that might be difficult to run without substantial resources. I'm working on a way for external students to be able to run simulated experiments for free!
Assignment 1 (basics) has the most hours of preparation invested in it, and only minor modernization/bug fixes were necessary this year.
How are you grading the student submissions? Also, do you catch students who fully use AI and don't follow the Honor code? If so, how?
We have autograding for code through tests written by hand, and additionally do manual code audits if we see suspicious behavior. We also do grading the old-fashioned way for writeups.
We do indeed catch students who don't follow the honor code. It's very obvious from how the code looks, as well as the rate of progress. Since we use Modal for class submissions, we have code deltas for every time they run something on B200s. The diffs often contain something like 300 lines in 5 minutes, in which case we review and report based on how egregious/provable it looks.