Hacker News

Reviews of the tool on twitter indicate that it completely nerfs the models in the process. It won't refuse, but it generates absolutely stupid responses instead.

butILoveLife 6 hours ago [ - ]

This is my experience with abliterated models.

I use Berkley Sterling from 2024 because I can trick it. No abliteration needed.

D-Machine 5 hours ago [ - ]

When you look at how monstrously large (and obviously not thought through at all, if you understand even the most minimal basics of the linear algebra and math of a transformer LLM) the components are that are ablated (weights set to zero) in his "Ablation Strategies" section, it is no surprise.

    Strategy            What it does  Use case
    .......................................................
    layer_removal       Zero out      entire transformer layers
    head_pruning        Zero out      individual attention heads
    ffn_ablation        Zero out      feed-forward blocks
    embedding_ablation  Zero out      embedding dimension ranges

https://github.com/elder-plinius/OBLITERATUS?tab=readme-ov-f...

littlestymaar 11 hours ago [ - ]

This is vibecoded garbage that the “author” probably didn't even test by themselves since making this yesterday, so it's not surprising that it's broken.

Also, as I said in a top level comment, what this project wants to achieve has been done for a while and it's called Heretic: https://github.com/p-e-w/heretic

(Not vibecode by a twitter influgrifter)

dinunnob 10 hours ago [ - ]

Hate to have to be the one to stick up for pliny here, but hes concerned about forcing frontier labs to focus more on model guardrails - he demonstrates results that are crazy all the time

https://x.com/elder_plinius

D-Machine 5 hours ago [ - ]

Thanks for this link, and mentioning this info some times in this overall thread.

It also seems the influgrifter has a lot of bots (or perhaps cultists) working this thread...

quotemstr 10 hours ago [ - ]

We will eventually arrive at a new equilibrium involving everyone except the most stupid and credulous applying a lot more skepticism to public claims than we did before.

And yeah, doing stuff like deleting layers or nulling out whole expert heads has a certain ice pick through the eye socket quality.

That said, some kind of automated model brain surgery will likely be viable one day.

IncreasePosts 10 hours ago [ - ]

I didn't use this tool, but I did try out abliterated versions of Gemma and yes, it lost about 100% of it's ability to produce a useful response once I did it

electroglyph 7 hours ago [ - ]

the default heretic with only 100 samples isn't very good, you really need your own, larger dataset to do a proper abliteration. the best abliteration roughly matches a very careful decensor SFT

Animats 11 hours ago [ - ]

Link?

It's interesting that people are writing tools that go inside the weights and do things. We're getting past the black box era of LLMs.

That may or may not be a good thing.

thegrim33 10 hours ago [ - ]

Whether or not the linked tool uses a good approach, manipulating models like you mention is already fairly well established, see: https://huggingface.co/blog/mlabonne/abliteration .

noufalibrahim 10 hours ago [ - ]

I believe that this is already done to several models. One that I've come across are the JOSIEfied models from Gökdeniz Gülmez. I downloaded one or two and tried them on a local ollama setup. It does generate potentially dangerous output. Turning on thinking for the QWEN series shows how it arrives at it's conclusions and it's quite disturbing.

However, after a few rounds of conversation, it gets into loops and just repeats things over and over again. The main JOSIE models worked the best of all and was still useful even after abliteration.

kube-system 11 hours ago [ - ]

I guess it's kind of like a lobotomy tool.

sheepscreek 10 hours ago [ - ]

I guess it proves you cannot unlobotomize a hole in the head.

halJordan 8 hours ago [ - ]

Everyone says that abliteration destroys the model. That's the trope phrase everyone who doesn't know anything but wants to participate says. If someone says it to you, ignore them.

9 hours ago [ - ]

[deleted]