Yeah it feels like these early LLMs are pretty decent at the coming up with a plan and executing a plan part.

Probably the main deficiencies are confusion as the context grows (therefore confusion as task complexity grows).