I mean, can’t they just train on some huge codebases? There’s lots of 100KLOC codebases out there which would probably get close to 1M tokens.
I mean, can’t they just train on some huge codebases? There’s lots of 100KLOC codebases out there which would probably get close to 1M tokens.