> If I sniff something is off, I start Googling for reference code, large projects in that language, etc.
This works so long as you know how to ask the question. But it's been my experience that an LLM directed on a task will do something, and I don't even know how to frame its behavior in language in a way that would make sense to search for.
(My experience here is with frontend in particular: I'm not much of a JS/TS/HTML/CSS person, and LLMs produce outputs that look really good to me. But I don't know how to even begin to verify that they are in fact good or idiomatic, since there's more often than not multiple layers of intermediating abstractions that I'm not already familiar with.)
I'm not much of a JS/TS/HTML/CSS person either. But if I think something looks off and it's something I care about, then I'll lose a day boning up on that thing.
To your point that you're not sure what to search for, I do the same thing I always do: I start searching for reference documentation, reading it, and augmenting that with whatever prominent code bases/projects I can find.
This motivates the question: if you're doing all this work to verify the LLM, is the LLM really saving you anytime?
After just a few weeks in this brave new world my answer is: it depends, and I'm not really sure.
I think over time as both the LLMs get better and I get better at working with them, I'll start trusting them more.
One thing that would help with that would be for them to become a lot less random and less sensitive to their prompts.
> and I don't even know how to frame its behavior in language in a way that would make sense to search for.
Have you tried recursion? Something like: "Using idiomatic terminology from the foo language ecosystem, explain what function x is doing."
If all goes well it will hand you the correct terminology to frame your earlier question. Then you can do what the adjacent comment describes and ask it what the idiomatic way of doing p in q is.
I think you’re missing the point. The point is that I’m not qualified to evaluate the LLM’s output in this context. Having it self-report doesn’t change that fact, it’s just playing hide the pickle by moving the evaluation around.
Not at all - my point was that it can effectively tutor you sufficiently for you to figure out if the code it wrote earlier was passable or not. These things are unbelievably good at knowledge retrieval and synthesis. Gemini makes lots of boneheaded mistakes when it comes to the finer points of C++ but it has an uncanny ability to produce documentation and snippets in the immediate vicinity of what I'm after.
Sure, that approach could fail in the face of it having solidly internalized an absolutely backwards conception of an entire area. But that seems exceedingly unlikely to me.
It will also be incredibly time consuming if you're starting from zero on the topic in question. But then if you're trying to write related code you were already committed to that uphill battle, right?