Yea it's been butchering relatively easy to moderate tasks for me even with reasoning set to high. I am hoping it's just tuning that needs to be done since they've had to port it to a novel architecture.

If instead the model is performing worse due to how much they had to shrink it just so it will fit on Cerebras hardware, then we might be in for a long wait for the next gen of ginormous chips.

Agree w/ you on the model's tendency to butcher things. Performance wise, this almost feels like the GPT-OSS model.

I need to incorporate "risk of major failure" into bluey bench. Spark is a dangerous model. It doesnt strongly internalize the consequences of the commands that it runs, even on xhigh. As a result I'm observing a high tendency to run destructive commands.

For instance, I asked it to assign random numbers to the filename of the videos in my folder to run the bm. It accidentally deleted the files on most of the runs. The funniest part about it is that it comes back to you within a few seconds and says something like "Whoops, I have to keep it real, I just deleted the files in your folder."

Ouch, at least it fesses up. I ran into problems with it first refusing to use git "because of system-level rules in the session". Then later it randomly amended a commit and force pushed it because it made a dumb mistake. I guess it was embarassed.

> If instead the model is performing worse due to how much they had to shrink it just so it will fit on Cerebras hardware

They really should have just named it gpt-5.3-codex-mini (served by Cerebras). It would have made it clear what this model really is.

Not if you're suggesting that "(served by Cerebras)" should be part of the name. They're partnering with Cerebras and providing a layer of value. Also, OpenAI is "serving" you the model.

We don't know how they integrate with Cerebras hardware, but typically you'd pay a few million dollars to get the hardware in your own datacenter. So no, "served by Cerebras" is confusing and misleading.

Also "mini" is confusing because it's not analagous to gpt-5.1-codex vs gpt-5.1-codex-mini. Gpt-5.3-codex-spark is a unique, _experimental_ offering that doesn't fit the existing naming suffixes.

I don't understand what's wrong with "spark". It's friendly and evokes a sense of something novel, which is perfect.

If you want to know more about the model, read the first paragraph of the article. That information doesn't need to be hardcoded into the model name indefinitely. I don't see any "gpt-5.3-codex-nvidia" models.

Uh, that paragraph translated from "marketing bullshit" into "engineer" would be "we distilled the big gpt-5.3-codex model into a smaller size that fits on the 44GB of SRAM of a Cerebras WSE-3 multiplied by whatever tensor parallel or layer parallel grouping they're doing".

(Cerebras runs llama-3.3 70b on 4 WSE-3 units with layer parallelism, for example).

> Also "mini" is confusing because it's not analagous to gpt-5.1-codex vs gpt-5.1-codex-mini. Gpt-5.3-codex-spark is a unique, _experimental_ offering that doesn't fit the existing naming suffixes.

That's basically exactly what gpt-5.3-codex-mini would be.

So perhaps OpenAI intentionally picked the model's layer param count, MoE expert size, etc to fit onto the Cerebras machines. That's like saying "the DVD producer optimized this movie for you" (they just cropped and cut it down to 4.7GB so it would fit on a DVD).

I'm underselling it a little bit, because it takes a bit more work than that to get models to run on Cerebras hardware (because they're so weird and un-GPU-like), but honestly if Cerebras can get Llama 3.1 405b or GLM 4.7 running on their own chips, it's not that much harder to have Cerebras get gpt-5.3-codex-mini running.