Mostly model size, and input size. Some models which use attention are O(N^2)