It still seems like something is missing from all these frameworks.
I feel like an average human wouldn't pass some of these metrics yet they are "generally intelligent". On the other hand they also wouldn't pass a lot of the expert questions that AI is good at.
We're measuring something, and I think optimizing it is useful, I'd even say it is "intelligent" in some ways, but it doesn't seem "intelligent" in the same way that humans are.
If a human cares about the work, they can often outperform an LLM because they will keep at it until the work meets their standard of quality. Whereas the LLM will guess and then wait to be corrected. As a recent tweet I saw said: it’s amazing how fast the software bottleneck went from writing code, to reviewing code.
I think we’ll need to split the concept of intelligence into the capacity to accomplish a task and the capacity to conceive and prompt a task. If the former is called “intelligence” then LLMs are intelligent.
But what then do we call the latter? I think the idea of an AI that can independently accomplish great things is where people talk about “general” intelligence. But I think we need a label more specific, that covers this idea that successful humans are not just good at doing things, they originate what should be done and are not easy to dissuade.
>I think we’ll need to split the concept of intelligence into the capacity to accomplish a task and the capacity to conceive and prompt a task. If the former is called “intelligence” then LLMs are intelligent.
Huh? No. "The capacity to accomplish a task" is not intelligence. By that definition, a washing machine is intelligent.
Capacity is not binary. A washing machine can accomplish one task, so it has low capacity. An LLM can accomplish many tasks, so it has higher capacity.
You said "the capacity to accomplish a task", not "the capacity to accomplish a certain number of tasks". Those are two different definitions.
Either way, as definitions for intelligence they're very lacking. Most people would include such abilities as making connections between unrelated facts, making abstractions, understanding what is relevant and what isn't, learning. Just being able to "accomplish many tasks" doesn't cut it. You could build a really complex machine that can accomplish many different tasks and that wouldn't make it more intelligent than a washing machine, it'd just make it more complicated. Intelligence is not in how many things the intelligent thing is able to do, but in how on-the-fly adaptable it is. Something truly intelligent does not need to be purpose-built to do anything, it can learn to make do with whatever resources it's got.
On the other hand, AI being very good at everything while select humans may only be very good at some things is likely also a quality we want to retain (or, well, achieve).
> I feel like an average human wouldn't pass some of these metrics yet they are "generally intelligent". On the other hand they also wouldn't pass a lot of the expert questions that AI is good at.
I think this approach is intentional. The philosophy is simply "extraordinary claims require extraordinary evidence". What you're saying is true, but producing a system that exhibits all human cognitive capabilities is a better threshold for the (absolutely wild) claim of the existence of AGI.