> You are an expert analyst evaluating how exposed different occupations are to AI. You will be given a detailed description of an occupation from the Bureau of Labor Statistics.
> Rate the occupation's overall AI Exposure on a scale from 0 to 10.
Are LLMs good at scoring? In my experience, using an LLM for scoring things usually produces arbitrary results. I'm surprised to see Karpathy employ it
The fact that the LLM appears to never assign an actual 0 or 10 makes me suspicious. Especially when the prompt includes explicit examples of what counts as a 10.
In my experience LLMs often have really solid insights in the thinking chains then vomit a nonsense score that doesn't make sense.
Now I'm not sure if this is actually an LLM only thing. Because I think people probably do similar when you ask them to give a number to things without providing a concrete grading rubric...
No. LLMs aren't experts in subjects. They can answer things in a confident manner, but nobody optimized LLMs to perform good analysis yet.
Let's ask the LLM to score how good it would be at scoring jobs from LLM exposure... /s