These competent open models you want to use were trained on data from people like you and me.

I wonder if there are competent models trained purely on permissive open-source code like MIT or Apache 2.0.

MIT and Apache 2.0 both require attribution, so it's not like limiting to those would help in license compliance.