Hey 'software engineer', how much of the output of an LLM it's actually reproducible vs the one from a calculator or any programming language with the same input in different sessions?
Hey 'software engineer', how much of the output of an LLM it's actually reproducible vs the one from a calculator or any programming language with the same input in different sessions?
A lot vs a human? I bet the LLM with the same prompt will write same code as before more often than I would (given I don’t remember what I wrote in the past).
Why are you so concerned about the LLM producing the exact same code across different sessions? Seems like a really weird thing to focus on. Why aren't you focused on things like security, maintainability, UI/UX, performance?
Agreed. It's not like humans can produce the same output given the same input for anything more than trivial inputs.
I'd argue that it's actually a benefit; I like that I can do several generations and compare them and pick the best result. HP, for example, used to do this with software teams, and that's how we got Rocky Mountain BASIC (AIUI the competing team was East Coast BASIC).
Comp-sci people like repeatability when they want that and true randomness when that is desired. Things in between are rarely desired.
In computing, things are much more useful when they behave in predictable ways. Even AI, many (most?) would argue.
Computer science and software development are hardly related
Check the Plan9/9front papers or IWP's and say that again after reading the docs on GeFS for instance.
Thanks you proved my point
You are wrong. You may think that they aren’t related because you write software without thinking about computer science, but that says more about you and modern software development seat-filling than it does anything else.
One can’t write anything efficient without comp-sci being forefront in your mind the entire time you are writing. Which explains exactly why everything is so slow today.
If you’re not thinking about what is computable, what is not computable, and what is easily computable as you think about the problems you are trying to solve, you are a professional novice.
I studied computer science at one of the top colleges. It's true there's a lot of overlap between the two. But they are still very different fields.
You definitely don't need to study computer science just to avoid making common performance issues and in fact just studying computer science doesn't mean that you aren't going to introduce things like N+1 query issues, that's sort of thing comes from experience more than anything.
Likewise, just studying computer science does not mean that you are going to be well suited for software development in the real world.
> If you’re not thinking about what is computable, what is not computable, and what is easily computable as you think about the problems you are trying to solve, you are a professional novice.
Of course I consider all of these things but that didn't come from studying computer science in college lol.
Not really related to this 'discussion' but this is an interesting problem in the AI space. It's essentially a well understood problem in unreliable distributed systems - if you have a series of steps that might not respond with the same answer every time (because one might fail usually) then how do you get to a useful and reliable outcome? I've been experimenting with running a prompt multiple times and having an agent diff the output to find parts that some runs missed, or having it vote on which run resulted in the best response, with a modicum of success. If you're concerned about having another layer of AI in there then getting the agents to return some structured output that you can just run through a deterministic function is an alternative.
Non-determinism is a problem that you can mitigate to some extent with a bit of effort, and is important if your AI is running without a human-in-the-loop step. If you're there prompting it though then it doesn't actually matter. If you don't get a good result just try again.
Don’t know if this is an annoying response… but how about just going through the code and check and grade the quality yourself?
I could do, but the end goal is to scale this to 100x what I can do myself, and there isn't time to review all those changes. By attempting to answer the problem when it's tiny and I can still keep it in my head then I'll end up building something that works at scale.
Maybe. The point is that this is all new, and looking forwards I think it's worth figuring out this stuff early.