I haven’t really looked into performance comparisons with tensor-based frameworks. This was mainly a learning project, so my goal wasn’t to make it fast.

My guess is that a scalar engine might have an advantage on tiny optimization problems (among others) with only a handful of variables, where the overhead of tensor frameworks dominates the runtime.