> It is very much like playing an instrument.

Or it is more like playing a slot machine and you imagine the rest.

This is how I feel whenever I see bold all caps instructions in a system prompt or someone claims they conducted "research" and found the magic prompt template that makes the model pay out.

Maybe it works some of the time but it isn't a solution that works everytime.

It reminds me of people hovering to play a slot machine when someone gets up and it hasn't paid out as if they've solved slot machines.

While I don't mind putting something in a loop until the tests pass, I'm less comfortable doing that when providers are silently rerouting to lower quality models, or in Google's case burning quota faster to ease their own server load without being transparent about what the "standard limits" are to begin with. [1]

I'm hopeful I'll be more comfortable with these "slot machines" when frontier models get to the point where they can be run locally on hardware I can actually afford so I know exactly what I'm getting and not jumping at shadows with providers playing tricks behind the scenes to ease their own load without admitting the customer is getting less for their money as they get more popular.

[1]: https://support.google.com/gemini/answer/16275805?hl=en&sjid...

Has there been any evidence of a well known provider rerouting to lower quality models?

Last I saw, engineers working at OpenAI denied this on HN.

I saw that someone set up a tracker that aims to record the performance of the models, and so far it has not shown any statistically significant deviation in performance for Codex, and not yet enough data for Claude: https://marginlab.ai/trackers/codex/

> Has there been any evidence of a well known provider rerouting to lower quality models?

The firm [Anthropic] would deliberately degrade the model’s performance in ways that were invisible to the user.

https://news.ycombinator.com/item?id=48485958

>This is how I feel whenever I see bold all caps instructions in a system prompt or someone claims they conducted "research" and found the magic prompt template that makes the model pay out. Maybe it works some of the time but it isn't a solution that works everytime.

For such thing to be useful, it's enough that they works substantially more times that not having those instructions in.

Every gambler thinks their system works, given enough chances.

A poor analogy depending on the setting because you can't adjust the odds with a slot machine, and the ROI is negative by design. If that's your experience, yeah, I wouldn't use an LLM either.

Pretty sure most modern slot machines are digital and you could adjust the odds (even to a positive EV) if you change the code.

You're being unfaithful to the original statement. The whole point of saying something is like a slot machine is that there are significant odds that you lose. If you ever have access to a casino slot machine that has a positive EV, there are no tangible negative aspects anymore; you would use it over and over again and accumulate significant wealth from the house. That's my point.

Instruments are pseudo-random until you know what you're doing. Slot machines are just slot machines

Musical instruments are not random. You’re just doing random inputs. Instruments are consistent, even if the “flavor” and quality varies with different builds.

Playing a B on a saxophone always plays a B.

I see you haven't tried a modular synthesizer yet :) Getting back to the same "place" in a patch can sometimes be impossible, and it does feel "random" until you get the hang of it.

But ultimately it isn’t unpredictable and random. That’s just a skill issue. There is literally no person good enough at prompting to create consistent, predictable, useful results.

Saxophone, being a wind instrument was a bad choice. I can definitely tell which student was blowing when hearing a note.

But your analogy remains solid if you substitute e.g. a piano and a reasonably proficient player. A single note would be nearly indistinguishable between players... But a full piece most certainly will sound different.

While I agree with you, I think it's diverging from the initial point.

The original take was "LLMs are very much like playing an instrument". I think they are very much NOT like playing an instrument.

While different musicians will produce different results, one musician won't get drastically different results on different days or when trying a different "copy" of the same instrument. If you can play the violin on your violin and I lend you my violin, you will still be able to play very consistently. You may argue that the sound will differ and you will have to adapt slightly, but that's not remotely similar to the randomness coming from LLMs.

Will you?

That's only if both violins are tuned the same way, and one must continually tune them lest they get out of sync.

Similarly, an LLM can be extremely consistent if tuned properly -- indeed, if you fix the weights and settings, they can be made "essentially deterministic" for many prompts!

The difference is that a violin player can predict how the known violin will behave under all relevant circumstances, will know how to get the right tone out of it, while you’re generally unable to predict the adequacy of output of even a deterministic LLM. You can’t practically reason about how varying the input to the LLM will ensure the adequacy of its output, while the violin player is perfectly able to do so for the violin.

This is because LLMs have aspects of chaotic dynamical systems, where small changes in initial conditions can lead to vastly different outcomes. That property is independent from nondeterminism.

Anyone who has even modest experience with a particular instrument can pick any one up at any time and play it. The way the notes are played is consistent and produces a consistent note. If you tune 50 guitars to standard, the chords all produce what they should., It is a predictable instrument. You do not pick up a trumpet in one place then another and find the key combinations are suddenly different.

You know what we are talking about. Tuning, poor playing, all of that is mild variation from what we know it is supposed to do every time and we can target the the notes they are supposed to hit consistently. You're comparing slight tonal variations to completely different outputs from the same inputs. If I hit a "C" on the piano, it is going to play "C." If it does not, then the piano is not functioning properly. LLM's for some reason get a pass on this and it makes them very distinct from musical instruments.

This feels like a very nitpicky steel man, not a productive attempt at discussion.

A poor B is still a B fingering and the sax is supposed to play a B every time. Missing it is human error, not tool error. I can pick up an alto sax, a clarinet, etc. any time, anywhere, and expect the same fingerings to work every time. My individual skill or mistakes or peculiarities of each build are not what is relevant here.

LLM’s do not operate consistently and make their own errors while we argue about which incantation makes it less inconsistent, knowing it will never actually perform as expected.

I played woodwinds regularly for 15 years so I feel fine with my example.

[deleted]

It is a bit of both. A non-deterministic instrument and a predictable slot machine.

I play slot machines as instrument! ;)

Roger Waters and Nick Mason were playing the cash register in 1973!