I've seen them all over the place.
The best are shockingly good… so long as their context doesn't expire and they forget e.g. the Vector class they just created has methods `.mul(…)` rather than `.multiply(…)` or similar. Even the longer context windows are still too short to really take over our jobs (for now), the haystack tests seem to be over-estimating their quality in this regard.
The worst LLM's that I've seen (one of the downloadable run-locally models but I forget which) — one of my standard tests is that I ask them to "write Tetris as a web app", and it started off doing something a little bit wrong (square grid), before giving up on that task entirely and switching from JavaScript to python and continuing by writing a script to train a new machine learning model (and people still ask how these things will "get out of the box" :P)
People who see more of the latter? I can empathise with them dismissing the whole thing as "just autocomplete on steroids".