It could also be a skill problem. It would be more helpful if when people made llm sucks claims they shared their prompt.
The people I work with who complain about this type of thing horribly communicate their ask to the llm and expect it to read their minds.
I don't really understand what you mean by this. The claim is that the same prompt with the same question produces worse results when it's queried in a model that has more than 200k tokens in its context. That doesn't have to do much with the "skillfulness" of using a model.
Prompt quality does matter, but at some point context side does matter.
I’ve had thing like a system that has a collection of procedural systems. I would say “replace the following set of defaults that are passed all around for system X (list of files) and in the managed (file) by a config” and it would do that but I’d suddenly see it be like “wait mu and projection distance are also present in system Y and Z. Let me replace that by a config too with the same values”. When system Y and Z uses a different set of optimized values, and that was clearly outside of the scope.
Never had that kind of mistakes happen when dealing with small contexts, but with larger contexts (multiple files, long “thinking” sequences) it does happen sometimes.
Definitely some times when I though “oh well my bad, I should have clarified NOT to also change that other part”, all the while thinking that no human would have thought to change both
None of what has been described is a "skill issue". The problem is when an identical prompt produces poor results once the context window exceeds 200k tokens or so.
Totally agree the LLM sucks posts should be accompanied with the prompt.
I agree, but at the same time it feels like victim blaming.
Nah, it's a variant of the XY Problem: https://xyproblem.info
I don't know. Is pointing out that someone holding a drill by the chuck won't get the results they expect that bad?