Indeed: LLMs do tasks that would otherwise be assigned to humans. So when pointing out deficiencies in LLM performance they should be compared to the alternative, which also isn't perfect.