Were you using structured output with gpt-5 mini?
Is there an example you can show that tended to fail?
I’m curious how token constraint could have strayed so far from your desired format.
Were you using structured output with gpt-5 mini?
Is there an example you can show that tended to fail?
I’m curious how token constraint could have strayed so far from your desired format.
Here is an example of the formatting I desired: https://x.com/barrelltech/status/1963684443006066772?s=46&t=...
Yes I use(d) structured output. I gave it very specific instructions and data for every paragraph, and asked it to generate paragraphs for each one using this specific format. For the formatting, I have a large portion of the system prompt detailing it exactly, with dozens of examples.
gpt-5-mini would normally use this formatting maybe once, and then just kinda do whatever it wanted for the rest of the time. It also would freestyle and put all sorts of things in the various bold and italic sections (using the language name instead of the translation was one of its favorites) that I’ve never seen mistral do in the thousands of paragraphs I’ve read. It also would fail in some other truly spectacular ways, but to go into all of them would just be bashing on gpt-5-mini.
Switched it over to mistral, and with a bit of tweaking, it’s nearly perfect (as perfect as I would expect from an LLM, which is only really 90% sufficient XD)