Interesting...

Two of the other responses speak about it being abysmal at tool calling.

Overall, I'm pretty impressed a model this small can find/fix ~12% of bugs with crappy context - even if they're about as easy as possible to fix.

I just assumed it would perform better, given all the advancements in the space.

It's possible 1B active parameters is just not enough - even if it has 8B params of knowledge to reason through bugs.

Playing around with the context I fed it, it was able to fix up to ~34% of bugs vs ~46% for Qwen2.5-Coder-3B and ~54% for Qwen2.5-Coder-7B.