I was going to suggest implementing RoPE to fix the context limit, but realized that would make it anatomically incorrect.
I intentionally removed all optimizations to keep it vanilla.
I intentionally removed all optimizations to keep it vanilla.