I'm a small software business owner in Europe. I have to assume my competition is willing to pay for any business advantage they can get. And so I also have to pay for the SOTA model, whatever it is.

> I'm a small software business owner in Europe. I have to assume my competition is willing to pay for any business advantage they can get. And so I also have to pay for the SOTA model, whatever it is.

If you make money from doing anything like "produce software with as little human involvement as possible", then sure, you need SOTA models. In that case, though, the value you add is very little and you probably don't have a sustainable business.

OTOH, if you make money by getting clients to pay for features, there is very little difference in time-savings from using Anthropic/OpenAI SOTA over GLM-latest.

IOW, if you business can only make money by one-shotting software, you probably don't have a business in the first place.

Regards, another small business owner.

You also don't really need LLM's, we still have software engineers too. Everyone is focusing so heavily on the speed gain producing code, but in my experience clients of established products aren't really waiting for massive changes and gigantic features to be added. We aren't taking the time to think things through anymore.

> clients of established products aren't really waiting for massive changes and gigantic features to be added

In some cases they do. I work in a B2B vertical SaaS company and there’s both features that competitors build or rough edges around our features that make clients go „either we get X or we sign with someone else”. I agree though with the general sentiment that you don’t need SOTA models to build those - humans or humans + mid pack strong model will do.

I have clients waiting for very gigantic features and the agent harnesses are a godsend.

I'm the only dev. I simply don't have time for dealing with the code from non-SOTA models. I'm doing all I can to keep this business afloat.

If you think your business depends on the ability for you to outspend the competition on LLM tokens, then you should cut your losses and shut it down right now.

> I'm the only dev. I simply don't have time for dealing with the code from non-SOTA models. I'm doing all I can to keep this business afloat.

It sounds that your business is selling completely agent-coded products. I don't know how long that will be viable, or even if it is right now.

In my part of the world, I am completely unable to sell completely agent-coded products, so even a SOTA model is useless. The majority of my time is spent on analysis outside of coding anyway, so when I bill it's not based on how many lines of code I've added, it's based on whether the goal of the customer is satisfied.

What part of the world is that where you can’t sell agent coded products?

> What part of the world is that where you can’t sell agent coded products?

You can try, but where I am there's literally no point - anything I offer that I bill based on how long my agent will take will be counter-offered by an even cheaper person using the same agent.

I've been through this cycle a few times already. It's pointless.

I sell outcomes, not lines of code. When I can get paid for unlocking revenue or reducing costs, SOTA makes not one bit of difference.

In practice, this means that I now don't even engage with clients who lead with "we want this program written" or "we want this feature added to this code we own". Those types of clients, their expectation is that you'll never need to bill more than the time you used to meet with them and maybe an hour of "labour".

You can, of course, continue as normal, but the expectation from clients now is that code is, for practical purposes, free. I've had one client last year vibe-code a ping program using Claude Code just to "prove" to me that my custom board+design+code for their industrial flow controller could have been done by their AI subscription.

If your business is "selling code", you aren't gonna win. If your business is "selling solutions" then you don't need SOTA anyway.

The good news (for you and most everyone other than the current leading AI companies), the gap between the SOTA and the near-frontiers is getting smaller every week or two. The leading Chinese models are only a few months behind now (GLM 5.2 tickles the tail of GPT 5.3 or 5.4 and Opus 4.6, according to benchmarks and the vibes among heavy users who've spent some time with it), where they were a couple of years behind a year ago.

4.6 was released at the beginning of February, so if the Chinese models only "tickle its tail," that means they're >5 months behind.

That comparison is also misleading because Opus 4.6 was probably not Anthropic's frontier model.

We got the first news about Mythos in March, so it is likely that it was already close to ready by the time Opus 4.6 was released.

So the actual gap is the time elapsed between March (or April for the official announcement) and whenever Chinese models can match Mythos.

The post-training process of a model that size is months, though it "works" before that. It is a big chunky model before it's released to the world and probably does some amazing things, sometimes...but, it wasn't done (else why wouldn't they release it and soundly trounce their competitors). I would assume that Chinese AI companies have a pipeline and what we see is a couple/few months behind their newest model, as well. Like, the new base model is cooked, but they're still plating it for service.

Why would Anthropic get the benefit of pre-release models counting toward their lead, if nobody else gets to count their pre-release models?

> The leading Chinese models are only a few months behind now

I hear that often, but what does that even mean? I am a great proponent of open weights models. I do believe they are the only reason we have not stagnated into a collusion of halting (public) model releases.

But exactly which point in time is z.ai compared to claude.ai? Consistently bring "6 months behind" in an exponentially acellerating evolution means the gap is growing exponentially wider, not constant.

"an exponentially acellerating evolution"

Oh? Exponentially accelerating, huh? That's quite a surprise, to me.

What range of numbers do you believe "a few" represents?

Opinions vary, but:

A couple: usually 2, though not always

A few: 3, 4, 5

Several: 4, 5, 6, or 7.

> A couple: usually 2, though not always

I had to explain this to my German friend. In my understanding this isn't about the actual number, it's about the certainty. If it's absolutely and definitely two, then I say two. If I'm uncertain but it's probably two, or if a non-integer, somewhere around two, then I say couple.

And few is more likely to be 3 than 5, because 5 is getting close to a "half-dozen or so", or (as you say) several.

Many is very context-sensitive, as the meme has it.

So I would agree that the open models are a few months behind, definitely more than a couple of months behind, possibly several months behind, maybe a half-dozen months or so behind, but not many months behind.

In the UK, as far as I can tell, a couple are 2. Not around 2. Not maybe 3 or 4. Always 2.

3 or 4 would likely be a few, or some. 1 is, well, one.

Several and a few are the same number, they only differ rhetorically.

I think several is used by most speakers for larger quantities than few. It has the connotation of being larger, and that changes usage.

Certainly below 6!

Whats the leading Claude Code competitor model over in China?

[deleted]

So I keep hearing.

Another day, more cope on this subject from many posters on here...

This is nonsense.

The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.

China has no flywheel for long-form agentic traces like Claude Code and its telemetry over its userbase (no one uses the Chinese harnesses yet). Most Chinese models are forced to price themselves significantly below cost to compete with the huge demand for bootleg claude tokens, because they're that much worse.

> is estimated at 10 months by Anthropic themselves, and it's growing.

How is this different than any business with something to lose saying a competitor isn't as good? Not saying it's false, but it would seem to me that it's more important how customers feel about the issue.

Didn't Elon Musk said the same or even worse about BYD? He isn't laughing anymore tho.

Ah, well, if Anthropic says their competitors are ten months behind...

I don't know what I was thinking.

Here in Australia the sudden withdrawal of Fable made all of us think hard about models and harnesses.

I've heard half a dozen people talk about how a less advanced model coupled with a better harness outperforms a smarter model in the last few weeks.

If the USA wanted to shoot its AI industry in the foot it achieved its goal.

Which products are you now using?

> The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.

There's a lot of subjectivity in determining this, but I'm 100% sure that 10 months is wrong.

I don't know whether the gap is currently growing, but I'm not sure it matters. There are thresholds where models reach certain levels of usefulness. Opus 4.8, for example, is at a level where I can give it relatively vague input, and it can go for half an hour on its own and produce a high-quality PR.

If GLM reaches that level of capability and can do that task more cheaply than Anthropic's model, I will use GLM for that task, because that's a specific type of task I use models for. It doesn't really matter whether Anthropic also has a better model, because what does "better" mean in this context? It's a clearly defined task, and Opus 4.8 already does it at a very high level of quality.

If Anthropic themselves say competition is 10 months behind, it's probably 5 or less.

And you seem to think "no one uses" DeepSeek's v4, z.AI's GLM 5.2 or Xiaomi's MiMo 2.5 from their official APIs when they probably dwarf Anthropic's usage and are widening the gap due to conquering a chunk of Western market too.

I know it's hard for some to comprehend there's an entire Eastern hemisphere in the globe with billions of people, so it's worth reminding. And some seem to think the world is basically silicon valley even.

Because claude subscription tokens are cheaper than deepseek and friends. You have whole industry of people reselling Claude subscriptions in China.

Can you comprehend than Anthropic is winning because is both cheap(subscriptions) and better SOTA. People are cheering China providers when I reality they would rugpull open weights the moment they are competive.

China models are trash that why they are giving them away for free.

For individuals and small companies subscriptions is the best deal, for big companies china models are big no unless they can host them.

No, Claude subscription tokens are not cheaper than the Deepseek API. You are dead wrong on that.

Not sure why you're being downvoted for being objectively correct.

HN is full of contrarians and folks who don't know what they're talking about in regards to AI.

> The gap between Chinese models and American frontier models is estimated at 10 months by Anthropic themselves, and it's growing.

#1 I've had use cases where it was clearly obvious the Chinese models were behind.

#2 I've also had use cases where I couldn't tell a difference at 1/20th of the price.

The problem is - the #1 is the use case where American frontier is gated behind saboteur classifiers and is tiny minority anyway. Vast majority of work is #2.

The gap doesn't matter anymore.

No you don't; it's often overkill to use the SOTA models. People want SOTA because it's shiny, but there are a lot of tasks where it's cheaper and more efficient to use other models.

> but there are a lot of tasks where it's cheaper and more efficient to use other models.

Sure… but which ones? How can you know ahead of time?

I just did a “simple” upgrade project where both me and the AI kept tripping over dead code, subtle typos, and difficult-to-trace live versus dead code.

Many times I used “Medium” thinking I got bitten, but not every time, and I couldn’t predict when.

So “Extra high” it was, for the entire project.

Far fewer nasty surprises!

Right. You hire the developer when you want a developer. But if I am building simple agentic workflows -- glorified automations with a small bit of structured "thinking" - I will sure use the cheapest API that can deliver that task at the speed I want.

I wonder where the market sizes will shake out for these different types of use cases? I am guessing right now 1 is bigger than 2 but not for long (by token volume)?

For programmatic usage oftentimes SOTA isn't useful.

For example, I have software that summarizes articles and classifies links on webpages to build a synthetic RSS feed, both of which use LLMs, neither of which need a SOTA model.

I'll probably use LLMs to bootstrap a dataset of native ads in articles, and there again, I don't really need a SOTA model.

If it's for more open ended tasks like writing code though, I agree that at this point SOTA models make more sense to use.

In my experience: anything of open-ended complexity (software development, research, product design, ...) benefits from wathever the frontier can offer. 95% of Line of Business automation and workflows can be handled by even a reasonably small open weights generalist model flanked by a few even smaller specialized models. Yes, designing such a setup takes more knowledge and work dan just chucking it all over the api with prompts. But that is how I can run a system here for <$30/month vs >$1.000 month. As an added bonus, no model server can shut me down at the drop of a hat.

Exactly. I simply don't have the time to deal with non-SOTA model output.

This is a great recipe for going out of business.

If the competitive risk is real, then are choosing between supplier risk (AI model access) and competitive risk.

When there isn't a zero-risk option, the question becomes which risk is smaller.

> If the competitive risk is real

Yes.

If.

Man I hope this tech FOMO eventually stops.

Companies generally fail because either their product doesn't meet a market need, or the market doesn't exist in the first place (possible because of bad timing), and not because they simply outran their competitors.

These aren't things fixed by using a frontier model to vibe code faster in lieu of one 5 months behind.

You can compete by being smart and using less-than-sota models and build a more solid business around them

I use whatever model is SOTA. I switch between them in order to avoid lock in.

>I use whatever model is SOTA. I switch between them in order to avoid lock in.

What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?

Not sure about OP, I usually make Opus 4.8 on Extra thinking level implement features for me on a specific project, while I'm busy with other stuff.

For a change, I let DeepSeek V4 Pro implement it on Max thinking level. Nothing too out there - some DB migrations, some Django back end changes and Vue SPA front end changes.

Implementation time in total including tests was a few hours, so nothing too egregious. However, one of the migrations would break with pre-existing data, one of the column references in the entity was wrong, the API endpoint wasn't made consistently with the others in adjacent code (e.g. permission checks) and the front end had a Pinia state related issue and submitting one of the forms didn't work.

Tooling was run: ruff, ty, Oxfmt, Oxlint, also Docker build was green across the board, but the overall feature just didn't work. In both cases, sub-agents with clear context would review the code for serious/critical issues, at least three in parallel and do review loops until they spot nothing. The harnesses both has LSP integration.

Opus spent another hour fixing it, needed a few iterations, because I couldn't be bothered there.

> What's your competitive edge here? Shaving off an hour of a feature delivery? Not having to see the code that is produced?

The difference largely was not needing to waste time in fixing all sorts of subtle bugs that sub-optimal models will produce, worse yet if it was some sort of a serious project and those wouldn't have been spotted but instead that slop would have gotten shipped.

That said, Opus isn't ideal either and messed up a whole bunch when I was training some neural nets and try to process a bunch of satellite data and configure Garage to store them so that tiles can be served from a slow HDD and stuff like that. Obviously, it also needs a lot of babysitting in regards to UI looks, but it's better at the rest of development.

I think that DeepSeek V4 Pro and GLM 5.2 are cool though, it's just that you want as many checks and tests as you can throw at any given problem, or use languages that make shipping completely broken code increasingly likely.

Any competitive business will accept this risk if it gives them any type of edge no matter the duration of that edge. This is no different that using an exotic raw material.

Every big business in the world biases towards risk reduction and cost reduction over getting an edge.

Different businesses have different biases.

Eh, this isn’t really how businesses operate. How many businesses refuse to give devs large-spec machines? That’s very clear positive ROI.

I think it’s excessively charitable to assume businesses are uber-competent ROI-chasers. The expense people are eventually going to win on AI too, this blip of unrestricted AI budgets will be gone soon.

And thus, capitalism continues to roll on. Businesses are suppose to go out of business, its a feature.

they’re not supposed to, they’re just able to

Nearly spit out my coffee, thanks for the chuckle.

It’s ok to be amused, absent exaggeration. Spit takes happen in sitcoms.

They do happen in real life.

They are overused in sitcoms because it’s easy for actors to mimic on demand unlike several other reactions.

I don't know if you write software for your own products or if you code for your customers. Anyway, are you going to compete on the speed of your code writing AI or on deploying the features your customers need? One useful feature is better than a hundred ones nobody really care about. And a good relationship with customers is better than any feature.

Example. Yesterday I listened the technical lead of a customer of mine digging himself into a hole by not understanding what it would mean exposing AWS EFS to their on premise server over NFS. It was just too many unknown unknowns for him and he had no time to ask the AI (and even if he did I'm not sure that he could understand.) His boss, which actually used NFS, had to stop him. I didn't speak a word.

So, he could have coded the migration of a server from AWS to on premise, asked Claude to write also all the configuration scripts and policies but then what?

I'm making a micro SaaS product. Code quality and code production speed are actually both super important. I don't have the time for non-SOTA model output.

> Code quality

You care about this but use LLM's to slop out features anyways?

What concrete business advantage are you getting from LLMs?

Speed.

This x 10 . I don’t understand how people are saying you can’t use LLMs to get crazy productivity gains. If you can’t write quality code with LLMs at ludicrous speed, you’re holding it wrong. You will have occasional bad days and regressions. But overall you’re still going to be able to 4x your progress.

I have plenty of experience with LLMs and use them daily but definitely wouldn't call generated code "quality code." Often looks like complete vomit.

That’s kinda what I mean. Maybe it only works well in some languages, but with the harness I built for C and C++ does a fantastic job of adhering to very strict architecture and style guides. Way cleaner, more readable, better factored, and more interpretable than human generated code, except maybe one or two devs I have worked with. YMMV I guess?

TBF I do burn 200k tokens just preloading the context with onboarding, not including any code, just document trees of development policy documents, style and architectural standards, code and documentation review processes, company ethos and culture, etc. it’s a token fire, but it really works for us.

Also, documentation driven development all the way down.

If you're an enterprise (including startups), you worry about customers, not code quality. There are famously many startups that gained traction despite shit code and then eventually got around to fixing it, to whatever extent was possible, like Facebook HHVM, Stripe's Sorbet, etc.

Startups failed because they cound not untangle own code after 4 months. Literally true stories (plural).

> Startups failed because they cound not untangle own code after 4 months.

That's rare, though. If they could not untangle their own code after 4 months, it's because they were not making enough money to pay a team to untangle it - that's not a code problem, it's a revenue problem.

IOW, the startup failed because their revenue was too low.

There are orders of magnitude that failed because they did not solve the right customer problem. Code quality is merely incidental the vast majority of the time.

[dead]

Ok, and? You can live with that if there are more important things to deal with.

I've stared at ugly LLM code, that I had just had generated, and worked well enough for my purposes. (generally, some quick recursion into a nested python dictionary in order to dig out some property -- especially for linting or quick data analysis).

And I wanted something better, sure, something a bit more readable ...but I just needed it to work well enough to recurse through a yaml file for config file linting, not be battle-hardened against every test case.

So to deal with the mess, I shoved it in a pure function, threw a few basic sanity unit tests around it, put a comment with a disclaimer of "#this is LLM generated code, it is lightly tested, do not use it for anything truly load-bearing without a lot more tests" and I moved on to something else.

Not everything has to be bulletproof.

You're on Hacker News. This is a site full of developers who are convinced that "proper software engineering" is 100% of what makes a business successful, and everything and everyone else is useless. You can't just waltz in here and point out that code in business is a means to an end and expect not to get downvoted.

It's ironic because around 20 years ago here, people knew HN was (more) explicitly for startup founders and the comments reflected that, with much more discussion on getting customers than writing code.

As a technical product manager, this 1000%. It's just irrelevant how bad code is unless it impacts the business.

> As a technical product manager, this 1000%. It's just irrelevant how bad code is unless it impacts the business.

If you are, in fact, "a technical product manager", I would hope you understand that "bad code" is identified as such specifically because it "impacts the business."

That is not how most engineers define bad code.

> That is not how most engineers define bad code.

The engineers I have worked with most definitely define "bad code" as having intrinsic limitations and/or latent defects which impact successful system functionality/operation. Indicators provided to stakeholders such as yourself which support this assessment are, but not limited to:

  - the system doesn't work that way
  - the system lacks test coverage, so changes take longer
  - adding feature "X" is not feasible
  - there is no repeatable way to onboard team members
  - the backlog grows exponentially
  - that "one point task" is going to take a couple weeks
All of the above impacts a business.

It is up to you, the "technical product manager", to understand what your team is trying to tell you.

Please stop being rude to me. I'm a human being, I'm a very experienced product manager and engineer (you can google my name, I'm the only one), and the way you are behaving sucks.

Everything you're saying is true, sometimes. Assume I'm still right, and that you might be able to learn something from someone else.

> Please stop being rude to me.

I do not see how I was being rude, unless it was my use of quotations around the title you claim.

> I'm a human being ...

I did not doubt this.

> ... I'm a very experienced product manager and engineer ...

Again, if it was my use of quotations which you found to be rude, then I do not know what to say about that.

> ... and the way you are behaving sucks.

I respect your perspective and support your right to express yourself. And no, I do not think you are being rude by doing so.

> Assume I'm still right ...

Why would I? You responded to:

>> This is a site full of developers who are convinced that "proper software engineering" is 100% of what makes a business successful, and everything and everyone else is useless.

With:

> As a technical product manager, this 1000%.

Finally, you write:

> ... you might be able to learn something from someone else.

Maybe you can learn something from someone else as well.

There was nothing rude about any of their replies.

They weren’t rude enough. Your complete apathy towards the many antisocial effects of badly engineered software, caring only about increasing shareholder value, is the reason why modern software not only sucks but actively makes our lives worse to use it.

Googling your name brings this missing person case as the only results: https://en.wikipedia.org/wiki/Disappearance_of_Logan_Schiend...

I guess if all you did was paste my last name into Google with no context, you'd get something like that. :)

This is something I wish I understood sooner. There is strong merit to "good enough".

Of all the "concise" and "beautiful" code I worked hard to produce, I was the only one to ever lay eyes on it. It didn't actually matter, and nobody cared but me. The people in charge of my raises could never perceive quality of code, because it wasn't their area of expertise. They only cared (rightly so) that it did what it was supposed to, and all the elegant abstractions didn't practically help that purpose. It was, literally, wasted life that I should have spent just getting off work early, like most of my colleagues.

Every bit of code written in the last 50 years is going to be meaningless.

People need to get to grips with that fast.

Distribution, relationships, processes, mindshare, marketing, and politics matter. Code is just ephemeral glue and implementation detail.

Not every bit of code is going to be meaningless.

Just 99.999%.

Lmao. Have more respect for your elders, who wrote all the code that your ai psychosis is fuelled by.

Every single thing around you was pioneered by people who are dead and forgotten. From the materials science of the clothes you wear, to the very language you speak.

Get over yourself. We're all ephemeral, dead and recycled in the blink of an eye. Our species doesn't even clock on the geologic timespan.

If you think your code (or any of your artifacts or possessions) matter beyond their immediate utility, you're mistaken. Work will either fall into disuse or be replaced. It's scaffolding for what comes next along a well-traversed path.

Look upon my works, the mighty, and despair!

I refuse to accept your existential nihilism. This mindset is not only toxic to the soul, but toxic to those who must suffer the effects of someone who only cares about “immediate utility”. What a depressing comment.

Dr Manhattan

I measured an ~8x increase in my project's commit count after AI, and I'm painstakingly reading, reviewing, understanding and editing everything the models write. It's gotten to the point I'm trying to slow down in order to let the new knowledge crystallize. I'm manually writing articles about what I'm doing as I go.

I can only imagine what people are doing at their jobs with unlimited token budgets.

> I measured an ~8x increase in my project's commit count after AI,

That's irrelevant. What's the increase in revenue?

I'm a hobbyist. My revenue will only increase if my work somehow lands me a job at some point.

Are you not employed at all?

Yes, but my field has not been hit by the AI frenzy yet. Outside the usual attempts to automate us, that is. I've used AI at work for research and corroboration but it hasn't led to 100x performance or anything of the sort.

Kind of weird how LoC has become a metric for people to chase again.

In my case it was commits, not lines of code. I wasn't chasing after it, I just asked Claude to calculate some statistics after a month or so of AI usage.

It's not just statistics either. I know for a fact that I made major progress by using LLMs. Here's a summary from around a month ago:

https://news.ycombinator.com/item?id=48407642

AI is world changing technology as far as I'm concerned.

You don’t have to imagine, listen to Boris’ publicly saying how he works with these things and it’s safe to assume others do it similarly or better

if hes still doing work on claude code, im not convinced its going all that great.

its a lot of features that feel half complete, with the llm pretending that the job is done rather than actually being done

I wonder if the people getting 10x productivity gains are spending less time on HN and more time tending to their agents. Personally I now spend so much time productively arguing with agents that it feels like an utter waste of effort arguing with humans, if people can't see the value in LLMs by now I'm not sure what I could say to change their minds.

We must then assume you're not getting those 10x gains

Less time, not zero time. I still argue with humans for sentimental reasons.

So you are accomplishing a year’s worth of work in a month? If that’s been happening for a few months, you must have a few years of work to show people right?

Definitely enjoying the lack of eye-rolling, being asked to explain obvious things multiple times, and stopping things being done for resume-stuffing reasons.

Exactly, no ego (I know I'm anthropomorphizing)

There's a small minority of people who are adamantly refusing to change, such as there are in every technological revolution. Ego prevents them from even wholeheartedly trying the tool, because it would be admission they were wrong.

The opportunities available for these people are rapidly, rapidly shrinking. I believe it's possible to be a developer today who's EXCEPTIONAL and never uses AI. Most opponents are not exceptional, though, and even these opportunities are shrinking.

Most exceptional developers in my org adopted AI in their workflows and went from 10x developers to 20x developers.

If you refuse to adapt, you're going to be out of a job complaining about the kids and their newfangled technology REAL quick. You have a few years remaining, maybe less.

I can’t turn 10x work into 20x work because I have to ensure the two juniors in my team who are now creating 50x work won’t merge complete garbage, reviewed by another engineer that has already given up on caring.

I can’t turn 10x work into 20x work because my Product Manager thinks changing fundamental premises of tasks I already spent two weeks on (mostly removing human blockers) is very simple. After all, when he asked Claude to update his prototype, it only took it 10 minutes.

I can’t turn 10x work into 20x work because the company dedicated entire teams to write company-wide skills for everything. They suck, but if I don’t use them, I’m not following the new “golden path for engineering”, and I lose points in my performance review.

I can, however, turn 10x work into 20x work, or even much more than that, if AI actually did what it’s promising and eliminated most of my team, the product manager, and the middle managers. Or me. I could use a break.

Damn, that sounds quite rough.

[dead]

What about the 6x developers? Was there just a doubling multiplier across the board, resulting in them becoming 12x developers, or did they too become 20x developers?

>> What concrete business advantage are you getting from LLMs?

> Speed.

Speed of what?

Speed of understanding what needs to be done? I highly doubt it.

Speed of LoC checked into git? Sure, I'll give you that.

But one can use any number of tools to generate hundreds of thousands of lines of code. See any build tools which support specifications such as RAML, OpenAPI, CORBA, etc.

So I ask again; speed of what?

fixing minor bugs takes one slack message for us now. bugs go down, goodness go up.

fixing more serious regression also easier. connect honeycomb mcp, ask agent to debug while i walk to coffee and get some pistachio rose dates. by time im back with my oat latte ive got a full report on what happened and can send the next slack message to fix.

life is good

I needed to deeply understand a code base I had no experience with in a language I don't normally use with what I would describe as haphazard documentation at best. You can't argue with the speed at which I gained the required understanding of the project.

In the time it took you to type that, your hourly market comp went down another basis point.

I am appalled none of this is clicking with you anti-AI folks. This is all so exciting -- alarming even! --, and software careers are never going to be the same.

I don't know how you just metaphorically stand there and act like nothing at all is happening. We've never seen anything like this in our entire lives.

Some of you are standing right in front of the steam roller, yelling to all of us that steam rollers aren't real.

Very very fast steam rollers.

Nice strawman[0], but you avoided answering my core question:

  Speed of what?
With ad hominems and a non sequitur. How about I narrow the question with the hope it engenders a relevant response:

  How do LLMs increase the speed of a person understanding
  what needs to be done?
0 - https://en.wikipedia.org/wiki/Straw_man

This argument feels like

A: The sky is blue! B: No it's not. A: Yes, it is, please look up. B: No, you must prove it to me through reason. A: But, if you would just pretty please look up. B: No.

I run a company, I've been running it for 10 years, we do alright. I'm a shitty manager. Every time I've hired developers, the business freezes. The business isn't anything super important, the main consequence of bugs is that my family loses money. Everything has always rested on my shoulders. In theory there is some path for me to become a good manager, but I never landed on it. But now, with Claude, it's great. So far Claude has paid itself off in real profits at least 20x over, and that's with significant API usage on top of the monthly sub. I can prototype new features in an afternoon that before were on my giant list of "maybe somedays if I ever get to breathe" list. Our user experience has improved in so many ways that I knew were probably worth it, if I could just find the time. Now I can.

There are situations where yeah, it probably isn't ready yet. But, there are so many where it's amazing. Seriously, it's worth looking up.

You’re just plain wrong to assume people against agentic development do not have experience with the technology

I think there are many valid reasons to be against them - I think a lot of them are more right than wrong. It’s the “It can’t really do much” that I think must be from people that haven’t really tried it.

This is a great case for the benefits of using GenAI, in that you already possess an understanding of what you want to achieve. You know what it is you want to prototype, what is on your "giant list of 'maybe somedays if I ever get to breathe' list", what you want to end up delivering.

My point is and remains:

  A) GenAI did not give you this understanding.
  B) GenAI can only assist in your expressing this
     preexisting understanding.
  C) GenAI is a statistical token (text) generator and
     cannot, by definition, "make" a person understand
     what they want/need to do.

Ideas and functionality beget more ideas and functionality

Did you use an LLM to write this for you? How odd.

For all of you people who think these LLM models are “earth shattering” how the hell do you reconcile that it’s a net positive for anyone but those who want to consolidate knowledge and power.

We are really looking at idiocracy in the making.

I guess I'll chime in as someone who thinks LLMs will be earth shattering, and specifically don't think it's a net positive for anyone but those whose power will be consolidated.

From my brief window of Fable usage, speed wasn't its strong point at all.

For actually building software, I'm starting to suspect a human with a dumber (but faster) model is going to get the job done quicker than Fable (and possibly even cheaper). Bug-finding and vulnerability detection is a different story.

I’d say you tried on an insufficiently complex codebase. I’ve tried on a MLOC+ and the results were excellent compared to anything else.

Not saying the results were bad - quite the opposite. But it was very slow (and if I was paying API rates, hideously expensive).

My conclusion was the exact opposite. Maybe each individual response was slower, but it took so many fewer round trips to get what I wanted wanted. I had a project fable was progressing steadily and correctly on. Opus on the same project keeps handing me garbage it insists is working and meets the stated requirements, but isn’t and doesn’t.

And quality if you know what you're doing.

Drawing debt

We'll just rebuild stuff when we get new requirements. The models will be even faster and better for the next version, anyway.

For businesses where this is true, they also need to be able to switch provider quickly in case the best provider changes.

It's almost identical to the possibility of one model getting shut down for a business that doesn't care about SOTA.

Yeah I have both the Claude and Codex 100 dollar subscriptions and I try to use both. I also keep the 20 dollar Cursor subscription as there I can play around with everything. I also refuse to use any harness specific features. Claude is particularly annoying with this in that it's the only one that doesn't respect open config standards like .agents/skills

[deleted]

This thinking that every task must be stuffed into the most 'advanced' (expensive) model out there is idiotic, and it's not only you unfortunately.

At $JOB I have warned higher ups we should try to keep our expenditure under control, educate people that document slinging doesn't require Fable every time and demo the capabilities of the cheaper models, and been snubbed for it. When Fable is available once again our bill is going to be eye watering, relative to what it should be.

If I am working on something simple and want the speed boost then I'll drop the thinking to low or minimal and still get the SOTA model output quality.

But for what I work on I mostly need high or xhigh SOTA model quality output. I don't have the time to deal with anything less.

This! I've found that for most coding, Sonnet is pretty good as it is. Yeah, you might need to finesse your prompt a bit more, and you'll probably be spending a bit more time on the computer, rather than a more hands-off approach, but at the end of the day, you'll save a lot more simply because you're using a good-enough model.

If you're the one-shotting type, obviously then Fable might be useful, but I think only marginally. You don't need to bring a MANPADS to a duel at high noon.

Sonnet is dogshit at coding unless you eval the exact niche to be fine and still watch it like a hawk.

If you can't figure out what model to use your business is already dead.

Unless you have concrete evidence via evals that SOTA is actually needed, you’re just buying into the hype.

do you think your current operation and niche is so optimized that not using Fable would put you out of business? Or is this a hope that using Fable will allow you to stay in business?

I am on track to commoditize my niche industry, and I hope I can do it before anyone else beats me to it. I'm working at panic speeds.

So, no moat right?

If there is any it’ll be rather small. I'm ideally placed to benefit from the commodification as I was planning on doing it anyway, now I'll just get there a lot faster with the help of AI.

Reducing your costs is also an advantage, but I'm not surprised such binary thinking is present here

So the panic generators ("You will be left behind!") are winning. Creating a sense of urgency that makes you switch off the higher rational functions is a key element in every successful scam.

Nonsense. Do you buy state of the art pens, pencils, printers, paper, computers, disks, etc.? No. You buy whatever is the best value for the case at hand. That’s often not the SOTA option.

Artists that need the best quality output use the best pens and papers. Call me a coding artist then haha. But seriously I don't have the time for anything less than SOTA.

Sure but that's orthogonal.

Yes you use the right tool for the job.

But if the job requires the best intelligence you can get with an LLM, then you use that.

Taking as an assumption that the quality of your product is a function of the quality of the inference you are using: if you use an inferior model because "what if it gets export controlled again" and your competitors don't, then your competitors are likely to win.

If you don't need frontier models for you job then this is all moot, but the thread started with

> You cannot build a business critical function on top of American SOTA frontier model

Which is silly. HN likes to roleplay bringing everythgin "business critical" in house because sometimes vendors mess up. Self host, don't use the cloud, run open models locally, built redundant supply chains in case of another covid, etc etc. Sometimes the risk is real, but most of the time the risk is rare and the cost of an interruption event is less than the cost of bringing everything in house or using lower quality vendors "just in case"

[deleted]