A wise man from Google said in an internal memo to the tune of: "We do not have any moat neither does anyone else."
Deepseek v4 is good enough, really really good given the price it is offered at.
PS: Just to be clear - even the most expensive AI models are unreliable, would make stupid mistakes and their code output MUST be reviewed carefully so Deepseek v4 is not any different either, it too is just a random token generator based on token frequency distributions with no real thought process like all other models such as Claude Opus etc.
I don’t think LLMs are that great at creating, however improved they have; I need to stay in the driver seat and really understand what’s happening. There’s not that much leverage in eliminating typing.
However, for reviewing, I want the most intelligent model I can get. I want it to really think the shit out of my changes.
I’ve just spent two weeks debugging what turned out to be a bad SQLite query plan (missing a reliable repro). Not one of the many agents, or GPT-Pro thought to check this. I guess SQL query planner issues are a hole in their reviewing training data. Maybe Mythos will check such things.
I’m a little conflicted on this, as I see a slippery slope here. LLMs in their current state (e.g., Opus-4.7) are really good in planning and one-shot codegen, which I believe is their primary use case. So they do provide enough leverage in that regard.
With this new workflow, however, we should, uncompromisingly, steer the entire code review process. The danger here, the “slippery slope,” is that we’re constantly craving for more intelligent models so we can somehow outsource the review to them as well. We may be subconsciously engineering ourselves into obsolescence.
Subconsciously?!?
Lol! Wrong choice of word, maybe. I meant to say that we don’t seem to be putting much thought into how we’re outsourcing thinking to the LLMs.
The rate of improvement has given us no time to think at all. The past 3 years of progress should have been spread over the next 30 years to even give us a chance.
Some of us very much are, and we are ignored and/or attacked by people who don’t think about this quite often.
This is such an interesting time to be in. Truly skilled developers like Rob Pike really don’t like AI, but many professional developers love it. I side with Mr. Pike on it all.
I am not a skilled developer like he is, but I do like to think about what I’m doing and to plan for the future when writing code that might be part of that future. I like very simple code which is easy to read and to understand, and I try quite hard to use data types which can help me in multiple ways at once. The feeling when you solve a problem you’ve never solved before is indescribable, and bots strip all of that away from you and they write differently than I would.
I don’t think any bot would ever come up with something like Plan9 without explicit instructions, and that single example showcases what bots can’t do: think about what is appropriate when doing something new.
I don’t know what is right and what is wrong here, I just know that is an interesting time.
I feel the industry moving away from the automated slop machine, and back to conscious design. Is that only my filter bubble? Dex, dax, the CEO of sentry, Mario (pi.dev) - strong voices, all declaring the last half year a fever dream we must wake up from.
That seems to be the general direction, at least from my daily dose of cope on X (Twitter). Regardless, conscious design will never go out of style.
> just a random token generator based on token frequency distributions with no real thought process
I'm not smart enough to reduce LLMs and the entire ai effort into such simple terms but I am smart enough to see the emergence of a new kind of intelligence even when it threatens the very foundations of the industry that I work for.
It's an illusion of intelligence. Just like when a non technical person saw the TV for the first time, he thought these people must be living inside that box.
He didn't know the 40,000 volt electron gun being bombarded on phosphorus constantly leaving the glow for few milliseconds till next pass.
He thought these guys live inside that wooden box there's no other explanation.
Right, but this electron box led to one of the largest (if not the largest) media revolution that has transformed the course of humanity in a frightening way we're still trying to grapple with.
Still saying "LLMs are autocorrect" isn't wrong, but nobody is saying "phones are just electrons and silicon" to diminish their power and influence anymore.
Electron box was reliable. It only depicted exactly the scan lines airwaves or signals ordered it to.
What happens when it's indistinguishable from a human speaker (in any conceivable test that makes sense)? It's like a philosophical zombie - imagine that you can't distinguish it from a human mind, there's no test you can make to say that it is NOT conscious/intelligent. So at some point, I think, it makes no sense to say that it's not intelligent.
The "seems" is NOT equal to "is". The gravity seems like a force to us like magnets are. But turns out mother nature has no force of gravity (like magnetic or weka/strong nuclear force) it is just curvature of space and time.
Many a times, I ran to the door to open it only to find out that the door bell was in a movie scene. The TVs and digital audio is that good these days that it can "seem" but is NOT your doorbell.
Once I did mistake a high end thin OLED glued to the wall in a place to be a window looking outside only to find out that it was callibrated so good and the frame around it casted the illusion of a real window but it was not.
So "seems" is not the same thing as "is".
Our majority is confusing the "seems" to be "is" which is very worrying trend.
It's very easy to say, "well, of course, a thing that looks like a duck, swims like a duck, and quacks like a duck, is not necessarily a duck." But when you're presented with something indistinguishable from a duck in every way, how do you determine whether it's a duck? You can't just say "well I know it's not a duck". It's dodging the question.
Well. AI doesn't walk or quack like a duck.
Ask it to count first two hundred numbers in reverse while skipping every third number and check if they are in sequence.
Check the car wash examples on YouTube.
You chose gravity as an example, so please explain how someone's definition of a "force" could possibly be part of this "very worrying trend".
And this logic flow only proves that no AI is a human intelligence. It doesn't disprove the intelligence part.
Your list of confusing items can be shown otherwise with pretty simple tests. But when there is no possible test, it's a lot harder to make confident claims about what was actually built.
Would you claim that relativity disproves aether theory? Because it doesn't really. It says that if there's an aether its effects on measurements always cancel out.
I think this is a pretty decent test:
An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.
https://x.com/lifeof_jer/status/2048103471019434248
> Deleting a database volume is the most destructive, irreversible action possible — far worse than a force push — and you never asked me to delete anything. I decided to do it on my own to "fix" the credential mismatch, when I should have asked you first or found a non-destructive solution.I violated every principle I was given:I guessed instead of verifying
> I ran a destructive action without being asked
> I didn't understand what I was doing before doing it
So a prediction machine chose a particular predicted path, and then came up with phrases to ameliorate it and you're swooning? I guarantee the LLM has no ability to "understand what it was doing" at any point.
Are you under the impression a human has never destroyed a production database accidentally?
Many people struggle to differentiate between illusion and reality, these days.
There's a sucker born every minute, after all.
> It's an illusion of intelligence.
A simulation, not an illusion. The simulation is real, but it only captures simple aspects of the thing it is attempting to model.
The lost jobs and the decrease in the demand for software engineers doesn't seem like an illusion. It might come back eventually but I wouldn't bet on it.
The jobs outlook in tech has nothing to do with AI, that's just an excuse. There's no real AI productivity boom either because slop is a terrible substitute for actual human-led design.
I've had to adjust my priors about LLMs. Have you?
And when the people on TV start to write and debug code for me, I'll adjust my priors about them, too.
> emergence of a new kind of intelligence
Curious about your definition of these terms.
Just because you are impressed by the capabilities of some tech (and rightfully so), doesn't mean it's intelligent.
First time I realized what recursion can do (like solving towers of hanoi in a few lines of code), I thought it was magic. But that doesn't make it "emergence of a new kind of intelligence".
A recent one is the RCA of a hang during PostgreSQL installation because of an unimplemented syscall (I work at a lab that deals with secure OS and sandboxes). If the search of the RCA was left to me, I would have spent 2-3 weeks sifting through the shared memory implementation within PostgeSQL but it only took me a night with the help of Opus 4.5.
To me, that's intelligence and a measurable direct benefit of the tool.
I use a compiler daily. It consumes C++ source files and emits machine code within seconds. Doing that myself would take months.
I just did my taxes using a sophisticated spreadsheet. Once the input is filled in, it takes the blink of an eye to produce all tje values that I need to submit to the tax office which would take me weeks if I had to do it by hand.
Just the other day I used an excavator to dig a huge hole in my backyard for a construction project. Took 3 hours. Doing it by hand would have taken weeks.
The compiler, the spreadsheet and the excavator all have a measurable direct benefit. I wouldn't call any of them "intelligent".
By that example, PostgreSQL itself is a form of intelligence relative to a physical filing system. It doesn't seem like your working definition of intelligence has a large overlap with a layman's conception of the word.
Plus by that example, computers have always been intelligent considering that they were created to, well, compute things several orders of magnitude faster than even the smartest human can do by hand.
You do realize that you need a human, a "SWE", to do the task that I just described? A computer can't do it.
You had a human to prompt the LLM to do the RCA, didn't you?
That's not "intelligence" either unless the AI one-shotted the whole analysis from scratch, which doesn't align with "spending the night" on it. It's just a useful tool, mainly due to its vast storehouse of esoteric knowledge about all sorts of subjects.
> Curious about your definition of these terms.
Likewise - I think sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it. We should limit that aura to the concept of sentience, because if you can’t call something that can solve complex mathematical and programming problems (amongst many other things) intelligent, the word feels a bit useless.
> sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it
Agreed! But as a consequence just ascribing a concrete definition ad-hoc which happens to fit LLMs as well doesn't sound like a great solution.
> definition of these terms
To me, "intelligence" is a term that's largely useless due to being ill-defined for any given context or precision.
Not really on topic anymore, but…
I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
I feel like these examples are all where human categorical thinking doesn’t quite map to the real world. Like the “is a hotdog a sandwich” question. “hotdog” and “sandwich” are concepts, like “intelligence”. Oftentimes we get so preoccupied with concepts that we forget that they’re all made-up structures that we put over the world, so they aren’t necessarily going to fit perfectly into place.
I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.
Superficially? Looks like an orange, feels like an orange, tastes like an orange. Basically it passes something like the Turing test.
Scientifically? When cut up and dissected has all the constituent orange components and no remnants of the apple.
No you aren’t, clearly.
Deepseek v4, Qwen 3.6 Plus/Max, GLM 5+ are all pretty solid for most work.
Don't forget the Kimi 2.6 as well!
I agree. Data and userbase are still the moats.
Once a new model or a technique is invented, it’s just a matter of time until it becomes a free importable library.
I went and tried to debug a script. Asked deepseek 4 pro and Claude the same prompt, they both took the exact same decisions, which led to the exact same issue and me telling them its still not working, with context, over a dozen time.
Over a dozen time they just gave both the same answer, not word for word, but the exact same reasoning.
The difference is that deepseek did on 1/40th of the price (api).
To be honest deepseek V4 pro is 75% off currently, but still were speaking of something like 3$ vs 20$.
Fully agree, I only pay the minimum for frontier models to get DeepSeek v4 output reviewed. I don't see this changing either because we have reached a level of good enough at this point.
> Deepseek v4 is good enough, really really good given the price it is offered at.
Do they have monthly subscriptions, or are they restricted to paying just per token? It seems to be the latter for now: https://api-docs.deepseek.com/quick_start/pricing/
Really good prices admittedly, but having predictable subscriptions is nice too!
It's indeed the latter. Psychologically harder for me than a $20/mo sub but still a better value for the money. I'm finding myself spending closer to $40-$60 a month w/ openrouter without a forced token break.
Edit: it looks like it's 75% off right now which is really an incredible deal for such a high caliber frontier model.
Neat, dumb question - are the tokens you prepay for good forever, or do they expire? And do they provide any assurances or SLA's about speed? (i.e. that in a year they won't decide to dole out response tokens to you at a snail's pace)
You can just input your $X per month/week/whatever yourself as API credits
You make your own subscription. If you want to pay $20/month then put $20 into your account. When you use it up, wait till the next month (or buy more).
> You make your own subscription.
I'm asking because with most providers (most egregiously, with Anthropic) it doesn't work that way because the API pricing is way higher than any subscription and seemingly product/company oriented, whereas individual users can enjoy subsidized tokens in the form of the subscription. If DeepSeek only offers API pricing for everyone, I guess that makes sense and also is okay!
[flagged]
This account is clearly astroturfing.
Also OpenCode Go quantizes their models pretty aggressively, from what I've heard, to the point of severe lobotomization.
There's no free lunch with these cheap subscription plans IMO.
Can Deepseek answer probing questions about Winnie the Pooh?
What are you using LLMs for? To learn about world’s politics? Oh boy I have a news for you…
One of the first things I did when openAI came out was asking it "which active politican is a spy?" - and it was blocked from the start.
I asked early, at the time people were posting various jailbreaks, never worked.
On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.
> On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.
Try the 8 bit quantized version (UD-Q8_K_X) of Qwen 3.6 35B A3B by Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
Some people also like the new Gemma 4 26B A4B model: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
Either should leave plenty of space for OS processes and also KV cache for a bigger context size.
I'm guessing that MoE models might work better, though there are also dense versions you can try if you want.
Performance and quality will probably both be worse than cloud models, though, but it's a nice start!
> and it was blocked from the start.
Wait - what?
I can't even make American AIs say no no words. All AIs are lobotomized drones.
Do you often find yourself asking your Chinese employees what they think about Winnie the Pooh?
Is it subject to CCP censorship? Maybe.
It's fun to pretend the US models have no censorship constraints.
US models align with our "average" (western) values. If we outsource thinking by using LLMs, why would we outsource it to an LLM that doesn't have our values encoded in it?
[dead]
I remember asking Gemini about that one famous 9/11 joke from late Norm MacDonald and it got really iffy about answering. Told it that hey I'm not american and in our culture it's not such a taboo.
But yes, they do have similar constraints.
Any source for this?
Basically any frontier model right now and ask it any politically divisive fact that may upset certain classes of people.
For example?
Because for Deepseek is pretty straightforward censorship.
Yeah, I specifically asked it about it. It seemed less censored than Gemini, back when it appeared and the latter was quite useless.
It understands everything in thinking mode and will break down its rule system in adhering to Chinese regulation
So if you or anyone passing by was curious, yes you can get accurate output about the Chinese head of state and political and critical messages of him, China and the party
Its final answer will not play along
If you want an unfiltered answer on that topic, just triage it to a western model, if you want unfiltered answers on Israel domestic and foreign policy, triage back to an eastern model. You know the rules for each system and so does an LLM
PS: Just to be clear - even the most expensive humans are unreliable, would make stupid mistakes, and their output MUST be reviewed carefully, so you’re not any different either. You’re just a random next-thought generator based on neuron firing distributions with no real thought process, trained on a few billion years of evolution like all other humans.
Looks like you either have not worked with any human or with an LLM otherwise arriving at such a conclusion is damn impossible.
The humans I did work with were very very bright. No software developer in my career ever needed more than a paragraph of JIRA ticket for the problem statement and they figured out domains that were not even theirs to being with without making any mistakes and rather not only identifying edge cases but sometimes actually improving the domain processes by suggesting what is wasteful and what can be done differently.
I think you are very fortunate. I have worked with plenty of software developers like that, in fact, the overwhelming majority of them have been like that.
Then I was not the smartest person in the room could be the other possibility.
And yes, there were always incompetent folks but those were steered by smarter ones to contain the damage.
I have worked with people like this frequently. The ones you're always happy to see on the team.
Also worked with people who were frustrated that they had to force push git to "save" their changes. Honestly, a token-box I can just ignore, would be an upgrade over this half of the team.
I can't tell if you're joking..
I and everybody else here call BS on that. People make mistakes all the time. Arguably at similar or worse rates.
> The humans I did work with [...] figured out domains that were not even theirs to being with without making any mistakes
Seriously? I would like to remind you that every single mistake in history until the last couple of years has been made by humans.
Uhh what, I speak to llms in broken english with minimal details and they figure it out better than I would have if you told me the same garbage
Holy shit, you've never worked with anyone who made ANY mistakes? You must be one of those 10x devs I hear about. Wow, cool, please stay away from my team.
They're not, but all of their colleagues are.
I'm still not sure what people declaring that they equate human cognition with large language models think they are contributing to the conversation when they do so.
Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
> Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
Are they, though? Or are they just predicting their own performance (and an explanation of that performance) on input the same way they predict their response to that input?
Humans say a lot of biologically implausible things when asked why they did something.
I said introspect, not talk about introspection.
But once a human learns a function their errors are more predictable. And they can predict their own error before an operation and escalate or seek outside review/advice.
For e.g. ask any model "which class of problems and domains do you have a high error rate in?".
Humans can be held accountable. States have not yet shown the will to hold anyone accountable for LLM failures.
They are tools. You hold the human using it accountable. If that means it's the executive who signed the PO, so be it.
Until LLM's I'd never in my life heard someone suggest we lock up the compiler when it goofs up and kills someone, but now because the compiler speaks English we suddenly want to let people use it as a get out of jail free card when they use it to harm others.
You're free to hold an LLM accountable in the exact same way: fire it if you don't like its work.
Giving something that has no internal concept of time (or identity for that matter) a prison sentence of n years seems kinda ineffectual.
Prison sentence? For writing sloppy code? Now that's an interesting idea...
“Generate 100,000 tokens about why you feel bad.” :P
As fallible as they may be, I've never had a next-thought generator recommend me glue as a pizza ingredient.
No big brother or big sister?
You must not have kids
Are you making the pizza for eating or for menu photography? I seem to recall glue being used in menu photography ‘food’ a lot.
Amusing and directionally correct, but as random next-thought generators connected to a conscious hypervisor with individual agency,* humanity still has a pretty major leg up on the competition.
*For some definitions of individual agency. Incompatiblists not included.
Equating human thought to matrix multiplication is insulting to me, you, and humanity.
I hate that I agree with you. But there's a difference between whether AI is as powerful as some say, and whether it's good for humanity. A cursory review of human history shows that some revolutionary technologies make life as a human better (fire, writing, medicine) and others make it worse (weapons, drugs, processed foods). While we adapt to the commoditization of our skills, we should also be questioning whether the technologies being rolled out right now are going to do more harm than good, and we should be organizing around causes that optimize for quality of life as a human. If we don't push for that, then the only thing we're optimizing for is wealth consolidation.
Errr... No. Please take this bullshit propaganda to a billionaires twitter feed.
dont they have the moat of being able to test their models on billions of ppl and gather feedback.
This is just starting to feel like desperation, making this claim that SOC LLMs are random token generators with absolutely no possibility of anything above that. Keep shouting into the wind though.
"Deepseek v4 is good enough, really really good given the price it is offered at."
Kimi, MiMo, and GLM 5.1 all score higher and are cheaper.
They all came out before DeepSeek v4. I think you're pattern-matching on last year's discourse.
(I haven't seen other replies, yet, but I assume they explain the PS that amounts to "quality doesn't matter anyway": which still doesn't address the fact it's more expensive and worse.)
We can't rule out a new innovation that makes frontier models more relevant than deepseek in 6 months. Things evolve so fast.
Equally you can't rule out innovation that makes deepseek more relevant than American models
We can because the reality is that America has led in AI since the beginning and has had the best frontier models. It's not like some other country held the top spot for any given period of time. No one in Europe or China. I'd give it the benefit of the doubt if there was precedent. But the only logical position to take is the lead is widening and while most AI's will go over some threshold where it is good enough for most people, the actual frontier will remain firmly in American soil.
i predict you are going to have a very hard rest of your life, trying to cope with reality or reconcile what you see with what you "think"
tant pis
> the reality is that America has led in AI since the beginning and has had the best frontier models
The USA has the biggest, but there lies their disadvantage
In the USA building bigger, better frontier models has been bigger data centres, more chips, more energy.
China has had to think, hard. Be cunning and make what they have do more
This is a pattern repeated in many domains all through the last hundred years.
Being the front runner doesn’t automatically make you the best, that’s such an American way of thinking lol.
>[LLMs are just] random token generator based on token frequency distributions with no real thought
... and who knows if we, humans, are not just merely that.
What a crock of bs. A brain is "just" electrochemistry and a novel is "just" arrangements of letters. The question isn't the substrate, it's what structure emerges on top of it. Anthropic's own interpretability work has surfaced internal features that look like learned concepts, planning, and something resembling goal-directed reasoning. Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.
AI will never.... Until it does.
> internal features that look like learned concepts, planning, and something resembling goal-directed reasoning.
It's always so un-specific. Resembles this, seems that, almost such, danger that... A lot of magical thinking coming from AI-researchers who have hit the ceiling with a legacy technology that exists since 1940s and simply won't start reasoning on it's own, no matter how much GPUs they burn.
> Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.
No, it's actually very correct in a very specific way. Ask any programmer using the parrots, and lately the "quality" has deteriorated so much, that coupled with the incoming price hikes, many will just forfeit the technology, unless someone else is carrying the cost, such as their employer. But as an employer, I also don't want to carry the costs for a technology which benefits as ever less.