Hacker News

RedCinnabar a day ago [ - ]

Call me back when you can run these models on 16GB of RAM and any recent i5/i7. Until then, there’s no point on using these toy models.

guax a day ago [ - ]

Its so funny, these "toy models" would be the wet dreams of researchers not 5 years ago.

Progress marches without mercy.

kgeist a day ago [ - ]

Yeah people don't realize these "toy models" now completely destroy gpt-4o on most tasks, and no one called gpt-4o a toy model back in the day... It was OpenAI's flagship model from 2024 to 2025.

Gigachad 21 hours ago [ - ]

Tbh in 2024 most were calling these models useless for programming and a scam. It wasn't until this year things really changed. My experience with Qwen 3.6 is it can do things, and it's super impressive it can do things, but it's not any more productive than doing it myself.

Catloafdev a day ago [ - ]

Hello, it's the internet calling, today is that day.

https://github.com/ikawrakow/ik_llama.cpp

Edit: it's gonna be slow if you're not using any VRAM. But it's possible. Software isn't going to speed that up anytime soon, it's just a hardware bandwidth limit.

giancarlostoro a day ago [ - ]

You need it to run in about 8 GB so you have extra space for the context window.

jboss10 a day ago [ - ]

They can be ran on 32GB with 8GB VRAM. I don't think these will be on 16GB for a while. (35B MoE)

TheCycoONE a day ago [ - ]

I have 32GB of RAM with 16GB VRAM and I haven't had a lot of luck running larger models like this. Are you able to expand on that?

slim a day ago [ - ]

use llama.cpp with cuda

TheCycoONE a day ago [ - ]

The problem may be that it's a 7800XT which handles memory contention by freezing.