Sounds like you picked some obscure tasks to test it that would obviously have low representation in the data set? That is not to say it can't be helpful augmenting some lower represented frameworks/tools - just you'll need to equip it with better context (MCPs/Docs/Instruction files)
A key skill in using an LLM agentic tool is being discerning in which tasks to delegate to it and which to take on yourself. Try develop that skill and maybe you will have better luck.