Just want to echo the recommendation for qwen3.5:9b. This is a smol, thinking, agentic tool-using, text-image multimodal creature, with very good internal chains of thought. CoT can be sometimes excessive, but it leads to very stable decision-making process, even across very large contexts -something we haven't seen models of this size before.

What's also new here, is VRAM-context size trade-off: for 25% of it's attention network, they use the regular KV cache for global coherency, but for 75% they use a new KV cache with linear(!!!!) memory-token-context size expansion! which means, eg ~100K token -> 1.5gb VRAM use -meaning for the first time you can do extremely long conversations / document processing with eg a 3060.

Strong, strong recommend.

I've been building a harness for qwen3.5:9b lately (to better understand how to create agentic tools/have fun) and I'm not going to use it instead of Opus 4.6 for my day job but it's remarkably useful for small tasks. And more than snappy enough on my equipment. It's a fun model to experiment with. I was previously using an old model from Meta and the contrast in capability is pretty crazy.

I like the idea of finding practical uses for it, but so far haven't managed to be creative enough. I'm so accustomed to using these things for programming.

What kind of small tasks do you find it's good at? My non-coding use of agents has been related to server admin, and my local-llm use-case is for 24/7 tasks that would be cost-prohibitive. So my best guess for this would be monitoring logs, security cameras, and general home automation tasks.

That's about it. The harness is still pretty rudimentary so I'm sure the system could be more capable, and that might reveal more interesting opportunities. I don't really know.

So far I've got it orchestrating a few instances to dig through logs, local emails, git repositories, and github to figure out what I've been doing and what I need to do. Opus is waayyy better at it, but Qwen does a good enough job to actually be useful.

I tried having it parse orders in emails and create a CSV of expenses, and that went pretty badly. I'm not sure why. The CSV was invalid and full of bunk entries by the end, almost every time. It missed a lot of expenses. It would parse out only 5 or 6 items of 7, for example. Opus and Sonnet do spectacular jobs on tasks like this, and do cool things like create lists of emails with orders then systematically ensure each line item within each email is accounted for, even without prompting to do so. It's an entirely different category of performance.

Automation is something I'd like to dabble in next, but all I can think of it being useful for is mapping commands (probably from voice) to tool calls, and the reality is I'd rather tap a button on my phone. My family might like being able to use voice commands, though. Otherwise, having it parse logs to determine how to act based on thresholds or something would also be far better implemented with simple algorithms. It's hard to find truly useful and clear fits for LLMs

Oh man you just gave me an idea to use something like qwen 3.5 to categorize a lot of emails. You can keep the context small, do it per email and just churn through a lot of crap.

The 0.8B can do this pretty well.

Actually pg's original "A plan for spam" explains how to do this with a Bayesian classifier.

I was just chatting with a co-worker that wanted to run a LLM locally to classify a bunch of text. He was worried about spending too many tokens though.

I asked him why he didn't just have the LLM build him a python ML library based classifier instead.

The LLMs are great but you can also build supporting tools so that:

- you use fewer tokens

- it's deterministic

- you as the human can also use the tools

- it's faster b/c the LLM isn't "shamboozling" every time you need to do the same task.

I use Haiku to classify my mail - it's way overkill, but also doesn't require training unlike a classifer. I recieve many dozens of e-mails a day, and it's burned on average ~$3 worth of tokens per month. I'll probably switch that to a cheaper model soon, but it's cheap enough the "payoff" from spending the time optimizing it is long.

I've been learning to apply these lately and it has been pretty eye opening. Combined with Fourier analysis (for example) you can do what seems kind of like magic, in my opinion. But it has been possible since long before LLMs showed up.

Totally different categories and different use cases, but the more I learn about LLMs the more I discover there's a powerful, determinsitic, well-established statistical model or two to do the same thing.

Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.

> Really, LLMs are kind of like convenient, wildly inefficient proxies for useful processes. But I'm not convinced they should often end up as permanent fixtures of logical pipelines. Unless you're making a chat bot, I guess.

I think I agree with this. It's made me realise LLMs are great for prototyping processes in the same way that 3D printers are great at prototyping physical things. They make it quick and easy to get something close enough to see the unforeseen problems a proper solution might have.

3d printing is a great analog because there are so many critical considerations that are often missed or can't be accounted for in the prototype, but, it's alright because it's a prototype. The strain testing, durability, manufacturing at scale; none of that is properly addressed. Those might involved some serious, expensive challenges, too. But it's alright because you've got something in your hand that informs you whether or not those challenges are worth contending with. I really love this about LLMs and 3d printing.

you can use 4B for that, its quite good

You can really see the limitations of qwen3.5:9b in reasoning traces- it’s fascinating. When a question “goes bad”, sometimes the thinking tokens are WILD - it’s like watching the Poirot after a head injury.

Example: “what is the air speed velocity of a swallow?” - qwen knew it was a Monty Python gag, but couldnt and didnt figure out which one.

As a person who also knows there's a connection between that phrase and Monty Python and not much more information beyond that, I'm not sure how to feel.

could that be some of the RL trying to get it to not regurgitate?

the gag is giving in detail which one

https://gist.github.com/mikewaters/7ebfbc73eb8624f917c5b4167...

It thinks like it’s memory is broken and it’s unaware of it; over 100 lines like this:

    - Wait, no, that's not right either.
    - Let's recall the specific line. It goes like this:
        - Knight A: "How can you have a swallow?"
        - Knight B: "It is the air speed velocity of a swallow."
        - Actually, the most common citation is from the movie where they ask an expert on swallows? No.

African or European?

My favourite colour is blue. Oh, no, it is...

I'd be curious to see people give their opinion on embedded models for less tech focused needs, say what's that bug killing spray chemistry like or what is the history of this or that...

I'd also be curious to see if people have started doing censorship analysis of various models, like Qwen differing Tiananmen square to government documments while Llama straights up answers the question.

How's it compare in quality with larger models in the same series? E.g 122b?

The chart on this link compares all qwen3.5 models down to 0.8B.

https://www.reddit.com/r/LocalLLaMA/comments/1ro7xve/qwen35_...

How much difference are you seeing between standard and Q4 versions in terms of degradation, and is it constant across tasks or more noticeable in some vs others?

Less than expected, search for unsloths recent benchmark

[flagged]

Describing what computers do as ”thinking” is not new. It’s a useful and obvious metaphor. https://www.gutenberg.org/ebooks/68991

It is a deceitful metaphor.

Do you also require computers to grow legs when they "run"?

"Thinking" is just a term to describe a process in generative AI where you generate additional tokens in a manner similar to thinking a problem through. It's kind of a tired point to argue against the verb since it's meaning is well understood at this point

I am a professional in the information technology field, which is to say a pedantic extremist who believes that words have meanings derived from consensus, and when people alter the meanings, they alter what they believe.

Using "thinking", "feeling", "alive", or otherwise referring to a current generation LLM as a creature is a mistake which encourages being wrong in further thinking about them.

We lack much vocabulary in this new situation. Not that I have words for it but to paint the picture: if I hang out with people sharing some quality I tend to assume it's there in others and treat them as such. LLMs might not be people, I doubt our subconscious knows the difference.

There is this ancient story where man was created to mine gold in SA. There was some disagreement whether or not to delete the creatures afterwards. The jury is still out on what the point is.

Consulting our feelings seems good, the feelings were trained on millions of years worth of interactions. Non of them were this tho.

What would be the point for you of uhh robotmancipation?

Edit: for me it would get complicated if it starts screaming and begging not to be deleted. Which I know makes no sense.

A consensus has formed in front of your eyes. The same development that resulted in you using the word "kill" in your earlier comment to refer to a computer process. For some reason you refuse to accept it.

think you're on the wrong side of the consensus here

I'd suggest spending more time studying words to relive your extremism. The meanings of words move incredibly quickly and a tremendous number of words have little to no relation to previous meanings.

Words such as nice, terrific, awful, manufacture, naughty, decimate, artificial, bully... and on and on.

> I'd suggest spending more time studying words to relive your extremism.

Should one study words to relive extremism? Or should one study words to relieve extremism?

To a doctor of linguistics: "Dr, my extremism... What should I do about it - with words?!? Please help."

That is the question.

Does the doctor answer thusly: "Study the words to relive the extremism! There is your answer!" says he.

or does he say: "Study the words to relieve and soothe the painful, abrasive extremism. Do it twice daily, before meals."

Sage advice in either case methinks.

I think you are still missing the point. No one in this thread is making an anthropological assertion. "Thinking" here is just shorthand for Chain of Thought[0], which some models have and some models don't. This model, being a "thinking" model, has it.

[0]: https://en.wikipedia.org/wiki/Prompt_engineering#Chain-of-th...

> I am a professional in the information technology field

Nice! Me too.

> which is to say a pedantic extremist

Uh never mind, we are not the same lol.

When people alter the meanings, you need to start using different words to describe what you believe.

Are insects not creatures?

Rebooting a machine running an LLM isn’t noticed by the LLM.

Would you feel comfortable digitally torturing it? Giving it a persona and telling it terrible things? Acts of violence against its persona?

I’m not confident it’s not “feeling” in a way.

Yes its circuitry is ones and zeros, we understand the mechanics. But at some point, there’s mechanics and meat circuitry behind our thoughts and feelings too.

It is hubris to confidently state that this is not a form of consciousness.

I'm not entirely opposed to the kind of animism that assigns a certain amount of soul, consciousness, or being to everything in a spectrum between a rock and a philosopher... but even so.

Multiplying large matrices over and over is very much towards the "rock" end of that scale.

If we accept the Church-Turing thesis, a philosopher can be simulated by a simple Universal Turing machine.

If one day we are able to create a philosopher from such a rudimentary machine (and a lot of tape), would you consider that very much towards the "rock" end as well?

Can a Turing machine of any sort truly indistinguishably simulate a nondeterministic system?

If a Turing machine can truly simulate a full nondeterministic system as complex as a philosopher but it would take dedicating every gram of matter in the visible universe for a trillion years to simulate one second, is this meaningfully different than saying it cannot?

I suggest the answer to both questions are no, but the second one makes the answer at worst "practically, no".

My feeling is that consciousness is a phenomenon deeply connected to quantum mechanics and thus evades simulation or recreation on Turing machines.

One thing about Turing Machines that some people might miss is that the "paper tape, finite alphabet and internal states" thing is actually intended to model a human thinking out loud (writing their thoughts down) on a piece of paper.

It was designed to make it hard to argue that the answers to your questions are "no".

Of course there are caveats where the Turing machine model might not have a direct map onto human brains, but it seems the onus would be for one to explain why, for example, non-determinism is essential for a philosopher to work.

That said,

> Can a Turing machine of any sort truly indistinguishably simulate a nondeterministic system?

Given how AI has improved in its ability to impersonate human beings in recent years, I don't see why not. At least, the current trend does not seem to be in your favor.

I can see why you think the answer is "no". My understanding is that QM per se is mostly a distraction, but some principles underlying QM (some subjectivity thing) might be relevant here.

My best guess is that the AI tech will eventually be able to replicate a philosopher to arbitrary "accuracy", but there will always be an indescribable "residue" where one could still somehow detect that it is not a real human. I suspect this "residue" is not explainable using materialistic mechanisms though.

I am not following what we are talking about here. I am a basic human being, I cannot truly simulate a nondeterministic system. Does it mean “I am not thinking”?

I'm saying a Turing machine cannot simulate you. You don't need to simulate you because you are you.

You are claiming that intelligence and even consciousness are non-deterministic entties at core. This is a huge claim and requires incredible proof.

I'll add that rocks are, if needed, objects that can exhibit quantum behavior.

In classical computing, we design chips to avoid the quantum behavior, but there's nothing in theory to prevent us from building an equivalent quantum Turing machine using "rocks".

What do you imagine the psychiatrist will do? That's an incredibly dismissive take.

Accept it in the spirit it was meant: if you have mental illnesses like this, you need treatment.

Ok but no one here actually implied that they think like this.

[dead]

Then don't get sorrow killing it. Living things are not so special.