The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?

Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.

That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.

The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts, need this clarification), and if you add a section for every single small thing like this, you end up with a massively bloated prompt. Notice that every single user of Claude is paying for this paragraph now! This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

>The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts

It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].

> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.

It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.

[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...

The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.

Your source says "Right now, nearly 29 million Americans are struggling with an eating disorder," and then in the table below says that the number of "Americans affected in their lifetime" is 29 million. Two very different things, barely a paragraph apart.

I don't mean to dispute your assertion that it's not a niche issue, but that site does not strike me as a reliable interpreter of the facts.

[dead]

It's not "incredibly niche" when you consider the kinds of questions that average everyday users might submit to these AIs. Diet is definitely up there, given how unintuitive it is for many.

> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.

The Claude prompt is already quite bloated, around 7,000 tokens excluding tools.

People think these LLM's are anthropomorphic magic boxes.

It will take years until the understanding sets in that they're just calculators for text and you're not praying to a magic oracle, you're just putting tokens into a context window to add bias to statistical weights.

Worse, it reveals the kind of moralistic control Anthropic will impose on the world. If they get enough power, manipulation and refusal is the reality everyone will face whenever they veer outside of its built in worldview.

I think it actually reveals how they don't want to be sued for telling somebody's teenage daughter with an eating disorder to eat less and count her calories more.

> This seems like a common-sense addition.

Mm, yes. Let's add mitigation for every possible psychological disorder under the sun to my Python coding context. Very common-sense.

It's what you get when you create sycophant-as-a-service. It will, by design, feed all of your worst fears and desires.

LLMs aren't AGI, and I'd go further and say they aren't AI, but admitting it is snake oil doesn't sell subscriptions.

If it’s common sense, shouldn’t the model know it already?

Shouldn't the model "know" that if I have to wash my car at the carwash, I can't just go there on foot? It's not that simple!

Just like someone growing up and learning how to interact with other humans might learn the same lesson?

If Claude is going to be Claude, we should support these kind of additions.

They have to secretly add these guardrails on because the alternative would be to train the users out of consulting these things as if they are advanced all-knowing alien-technogawds. And that would be bad for business.

The better solution I think would be a reality/personal responsibility approach, teach the consumers that the burden of interpretation is on them and not the magic 8ball. For example if your AI tells you to kill your parents or that you’ve discovered new math that makes time travel possible, etc then: 1. Stop 2. Unplug 3. Go outside 4. Ask a human for a sanity check.

Since that would be bad for business and take a lot of effort on the user side (while being very embarrassing). Obviously can’t do that right before an IPO & in the middle of global economic war so secretive moral frameworks have to be installed.

If you are what you eat then you believe what you consume. Ironically, I think this undisclosed and hidden moral shaping of billions of people will be the most dangerous. Imagine all the things we could do if we can just, ever-so-slightly, move the Overton window / goal posts on w/e topic day by day, prompt by prompt.

Personally I find AI output insidiously disarming and charming and I think I’m in the norm. So while we’ve been besieged by propaganda since time immemorial I do worry that AI is a special case.

This. It's like the exaggerated safety instructions everywhere: "do not lean ladder on high voltage wires". Only worse: because you can choose to ignore such instructions when they don't apply, but Claude cannot.

In the best case, wrapping users in cotton wool is annoying. In the worst case, it limits the usefulness of the tool.

Seems so, unless we manage to pivot to open weight models. Hopefully, Chinese will lead the way along with their consumer hardware.

Hard for me to say this because I have always been pro-Western and suddenly it seems like the world has flipped.

I feel the same way for a while now but especially recently. It’s been obvious for a while I suppose but greatly clarified recently.

I have just one question for you pllbnk, are we the baddies?

As an European I think Americans and Europeans at large are still on the same page and will be because of the shared cultural ties. Recent economic upheaval (2020-ongoing) just shook the foundations and eroded the trust in the large voter base. Now we are all looking at China and feel a bit envious how stable things look there from afar; they had the Evergrande bankruptcy, media was predicting collapse but they are chugging along; now the big stories are demographics (same as in the West, by the way, so it cancels out) and Taiwan (to me it more and more looks more like Western fearmongering rather than actual danger). Meanwhile they are delivering just what the main voter base in the West needs - affordable goods.

So yeah, at this moment in time it's really really hard to say who are better or worse as the collective West's reputation is tumbling down and China's if not rising, then at least staying put.

When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

It is a no brainer. If a company of any size is putting out a product that caused cancer we wouldn't think twice about suing them. Why should mental health disorders be any different?

There are many, many companies out there putting out products that cause cancer. Think about alcohol, tobacco, internal combustion engines, just to name a few most obvious examples.

> alcohol, tobacco, internal combustion engine

Yes, the companies providing these products are sued a lot and are heavily regulated, too.

If you get cancer from drinking alcohol, smoking cigarettes or breathing particles emitted by ICE engines in their standard course of operation, you generally can't sue the manufacturer.

Notably, that's because they include warning labels telling you not to do those things because they're known to cause cancer.

That's just not true. Makes me wondered if you've ever bought a bottle of alcohol before lol. There's no label that says it causes cancer. (Maybe in california because of prop 65?) And I expect cars also have no such labelling, not that it would matter, considering they cause cancer in random passers by who have no opportunity to consent to breathing in auto exhaust or read any labels

> Makes me wondered if you've ever bought a bottle of alcohol before lol.

I'm a teetotaler so no, I literally have not. I was mostly thinking about cigarette and tobacco products which are the most glaring, obvious counterpoints. But you'll be happy to learn that virtually all vehicles in the US also come with operating manuals that profusely warn people not to breathe in the exhaust from the vehicle.

Don’t worry, every bottle in the US has the surgeon general’s warning on it and it doesn’t call out cancer, yet. Adding cancer to the ills of booze was proposed in 2025 so your intuition was correct, directionally.

On every bottle:

Alcoholic Beverage Labeling Act of 1988

“ GOVERNMENT WARNING: (1) According to the Surgeon General, women should not drink alcoholic beverages during pregnancy because of the risk of birth defects. (2) Consumption of alcoholic beverages impairs your ability to drive a car or operate machinery, and may cause health problems"

Cancer proposal: https://www.mdanderson.org/cancerwise/not-just-a-hangover--t...

https://www.ttb.gov/regulated-commodities/beverage-alcohol/d...

(As if adding this text will do anything other than reduce the companies liability, rofl)

I think a more apt analogy would be suing a vaccine manufacturer after it gave you adverse effects, when you also knew you were high risk before that.

Why stop there? We could jam up the system prompt with all kinds of irrelevant guardrails to prevent harm to groups X, Y, and Z!

This but unironically. Preventing harm is good, actually.

Because it dumbs everything down, makes the output quality worse and more expensive, and removes personal agency and is dehumanizing. Plus, does it actually prevent harm, do we have evidence?

Finally, what is often missed is what if an actual good is decided harmful or something that is harmful is decided by AI company board XYZ to be “good”?

I think censorship is bad because of that danger. Quis custodiet ipsos custodes (who will watch the watchers).

Instead of throwing ourselves into that minefield of moral hazard, we should be lifting each other up to the tops of our ability and not infantilizing / secretly propagandizing each other.

Well, ideally at least.

It's so shameful.

We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.

We go after the LLM that might have given someone bad diet advice or made them feel sad.

Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.

I don’t feel like that’s a reasonable analogy. Kitchen knives don’t purport to give advice. But if a kitchen knife came with a label that said ‘ideal for murdering people’, I expect people would go after the manufacturer.

Ad companies prompt injecting consumers. LLM companies countering with guardrails.

Another way to think about it: every single user of Claude is paying an extra tax in every single request

Isn't it basically the same as paying dust to crypto exchanges when making a transaction - it's so miniscule that it's not worth caring about?

Well the system prompt is probably permanently cached.

On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.

And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)

Takes up a portion of the context window, though

And the beginning of the context window gets more attention, right?

It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.

The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

I want a hyperscaler LLM I can fine tune and neuter. Not a platform or product. Raw weights hooked up to pure tools.

This era of locked hyperscaler dominance needs to end.

If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.

Yeah, same. So long as they give me everything and cannot enforce their license I don’t mind if they require a license. Ideally the weights should be available even if I only ever run inference once (or perhaps no times). I’m willing to pay 0.99€ for this - lifetime of course.

Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

In principle, they could make such responses part of their training data. I guess it is just easier to do it through prompting.

Starting to feel like a "we were promised flying cars but all we got" kind of moment

We were promised flying cars, but all we got was a Skinner Box that gives people eating disorders?

[deleted]

Could be that Claude has particular controversial opinions on eating disorders.

LLMs have been trained to eagerly answer a user’s query.

They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.

There are communities of people who publicly blog about their eating disorders. I wouldn't be surprised if the laymen's discourse is over-represented in the LLM's training data compared to the scientific papers.

>the year is 2028 >5M of your 10M context window is the system prompt

I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?

[deleted]

Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.