I think this is predicted? Part of the story is how they were able to preserve core reasoning ability while cutting knowledge like "pelicans have wings."
> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.
So I think the takeaway here is, this is a super fast companion model to larger models, that reasons quickly. Perhaps this technique can be used to train a highly optimized reasoning "expert" in MoEs.
This model doesn't support tool calling, was not part of its training. It's focused on Python (and I think C++) competitive programming and mathematics tasks, i.e. tasks with verifiable rewards. So if you have a task that fits that description, the size-to-capability ratio is good.
These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.
I'm not seeing any mention of tools in the paper, much less a bias towards "curiosity" to use those tools when it encounters gaps in its knowledge. So perhaps this is a good proof-of-concept that single-pass code generation is viable with this small a model - but we're still a long way from a viable solution.
try it again but give a careful explanation of what a bicycle and a pelican is and how the pelican would sit atop the bicycle. Then give it a reference to the SVG tags you want it to use with documentation.
Imagine you want to make a smaller model that is really good at one thing, say, driving a car. You could remove the parameters that lead it to correctly answer, "What is the powerhouse of the cell?" or, "Who was the first president of the United States?"
It would look really dumb if someone asked it that, but that's fine. You're trying to make a model that is optimized for efficiency for a specific task. As much as possible, you should prune uncorrelated things.
In this case, I’d expect it should make a web search tool call to find the Python library best suited for SVG generation and manipulation, and then use what it learns there to execute the task you’ve asked it to do (either asking if you’d like to incorporate the library as a dependency or to roll its own implementation of a subset of the features if that was your preference),
Assuming tool calling hasn’t been entirely stripped out of this model.
I think this is predicted? Part of the story is how they were able to preserve core reasoning ability while cutting knowledge like "pelicans have wings."
> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.
So I think the takeaway here is, this is a super fast companion model to larger models, that reasons quickly. Perhaps this technique can be used to train a highly optimized reasoning "expert" in MoEs.
The only real essential item here is tool calling capability is it not? So I assume they tested a strong read/write/edit tool consistency?
This model doesn't support tool calling, was not part of its training. It's focused on Python (and I think C++) competitive programming and mathematics tasks, i.e. tasks with verifiable rewards. So if you have a task that fits that description, the size-to-capability ratio is good.
These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.
I'm not seeing any mention of tools in the paper, much less a bias towards "curiosity" to use those tools when it encounters gaps in its knowledge. So perhaps this is a good proof-of-concept that single-pass code generation is viable with this small a model - but we're still a long way from a viable solution.
try it again but give a careful explanation of what a bicycle and a pelican is and how the pelican would sit atop the bicycle. Then give it a reference to the SVG tags you want it to use with documentation.
Here's what I got
https://9ol.es/tmp/pelican.png
with https://9ol.es/tmp/prompt_pelican.txt
using prithivMLmods/VibeThinker-3B-GGUF:Q4_K_M
Its for reasoning not generating art?
Can you explain this a bit more
Imagine you want to make a smaller model that is really good at one thing, say, driving a car. You could remove the parameters that lead it to correctly answer, "What is the powerhouse of the cell?" or, "Who was the first president of the United States?"
It would look really dumb if someone asked it that, but that's fine. You're trying to make a model that is optimized for efficiency for a specific task. As much as possible, you should prune uncorrelated things.
SVG generation is a useless test, what's there more to know?
What if you're reasoning about how to generate SVG correctly?
In this case, I’d expect it should make a web search tool call to find the Python library best suited for SVG generation and manipulation, and then use what it learns there to execute the task you’ve asked it to do (either asking if you’d like to incorporate the library as a dependency or to roll its own implementation of a subset of the features if that was your preference),
Assuming tool calling hasn’t been entirely stripped out of this model.
(Edit) No tool calling, per this comment: https://news.ycombinator.com/item?id=48640189
That’s all I needed to hear
As in, you learnt that a useless test that no one should be using was tested here, that's what you meant right?
right?