Hacker News

> so llama.cpp just doesn't handle it correctly.

It is a bug in the model weights and reproducible in their official chat UI. More details here: https://github.com/ggml-org/llama.cpp/pull/19283#issuecommen...

I see. It seems the looping is a bug in the model weights but there are bugs in detecting various outputs as identified in the PR I linked.