Agree; posts like this frustrate me.
Tldr: you're doing it wrong but I will not show you how to do it right. I also did not run the bench using my approach but it definitely “vibes better” to me, and I reject your actual research paper.
Come on, show us some actual skills.
That one you use all the time looks a hell of a lot like “I wont a deterministic shell script for something a skill saying ‘run the shell script’”
Is that what you do? How much time do you spend on them? How do you stop the agent from making a bunch of very similar skills? How do you deal with the explosion of the total number of skills impacting your token use? Do you use skills from github, or is that bad practice? Why?
So many unanswered questions; so little content. :/
[flagged]