Firstly, automobiles are really impressive.

Second, with that out the way, these cars are not playing the same game as horses… first, and quite obviously they have massive amounts of horsepower, which is kind of like giving a team of horses… many more horses. But also cars have an absolutely massive fuel capacity. Petrol is such an efficient store of chemical energy compared to hay and cars can store gallons of it.

I think if you give my horse the ability of 300 horses and fed it pure gasoline, I would be kind of embarrassed if it wasn’t able to win a horse race.

Yeah man, and it would be wild to publish an article titled "Ford Mustang and Honda Civic win gold in the 100 meter dash at the Olympics" if what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run.

Actually that's too generous, because the humans are given a time limit in ICPC, and there's no clear mapping to say how the LLM's compute should be limited to make a comparison.

It IS an interesting result to see how models can do on these tests - and it's also a garbage headline.

> what happened was the companies drove their cars 100 meters and tweeted that they did it faster than the Olympians had run

That would be indeed an interesting race around the time cars were invented. Today that would be silly, since everyone knows what cars are capable of, but back then one can imagine a lot more skepticism.

Just as there is a ton of skepticism today of what LLMs can achieve. A competition like this clearly demonstrates where the tech is, and what is possible.

> there's no clear mapping to say how the LLM's compute should be limited to make a comparison

There is a very clear mapping of course. You give the same wall clock time to the computer you gave to the humans.

Because what it is showing is that the computer can do the same thing a human can under the same conditions. With your analogy here they are showing that there is such a thing as a car and it can travel 100 meters.

Once it is a foregone conclusion that an LLM can solve the ICPC problems and that question has been sufficiently driven home to everyone who cares we can ask further ones. Like “how much faster can it solve the problems compared to the best humans” or “how much energy it consumes while solving them”? It sounds like you went beyond the first question and already asking these follow up questions.

You're right, they did limit to 5 hours and, I think, 3 models, which seems analogous at least.

Not enough to say they "won gold". Just say what actually happened! The tweets themselves do, but then we have this clickbait headline here on HN somehow that says they "won gold at ICPC".

Agreed. The linked messaging is much more clear: "achieved gold-medal level performance". This clearly separates them from competing against humans, which they didn't do, because their constraints are very different. The "AI wins gold at ICPC" line really does seem designed to rile people up.

> under the same conditions

That's a very interesting question. When comparing wildly different computing machines, how to make a fair comparison?

At least two criteria comes in mind: the volume and the energy consumption.

Indeed we can safely assume that more volume and more energy leads to more computation power. For example, it is not fair to compare a 10m^3 room filled with computers with 10cm^3 computer. The same goes with the number of kilowhat-hours used.

Thinking further on those two criteria for GPUs and humans, we could also consider the access to energy and volume. First, energy access for machines has dramatically increased since the industrial revolution. Second, volume access for machines has also increased since the beginning of the mass production. In particular, creating one cube meter of new GPUs is faster than giving birth to a new human.

tldr: fair comparison of two machines should take into account their volume and their energy consumption. On the other hand, this might be mitigated by how fast a machine can increase its volume, and what is its bandwidth for energy consumption.

Cars going faster than humans or horses isn't very interesting these days, but it was 100+ years ago when cars were first coming on the scene.

We are at that point now with AI, so a more fitting headline analogy would be "In a world first, automobile finishes with gold-winning time in horse race".

Headlines like those were a sign that cars would eventually replace horses in most use-cases, so the fact that we could be in the the same place now with AI and humans is a big deal.

It was more than interesting 100+ years ago -- it was the subject of wildly inconsistent, often fear-based (or incumbent-industry-based) regulation.

A vetoed 1896 Pennsylvania law would have required drivers who encountered livestock to "disassemble the automobile" and "conceal the various components out of sight, behind nearby bushes until [the] equestrian or livestock is sufficiently pacified". The Locomotive on Highways Act of 1865 required early motorized vehicles to be preceded by a person on foot waving a red flag or carrying a red lantern and blowing a horn.

It might not quite look like that today, but wild-eyed, fear-based regulation as AI use grows is a real possibility. And at least some of it will likely seem just as silly in hindsight.

For more than thirty years, the speed limit for cars in Britain was 4mph - a self-propelled vehicle travelling faster than walking pace was obviously unconscionably dangerous.

To celebrate the raising of the speed limit to a daring 12mph, a group of motorists organised a drive from London to Brighton. At the time, driving 54 miles in a single day was seen as an audacious feat and few people imagined that such a great distance could be travelled in such complicated and newfangled contraptions without mechanical incident.

For decades, the car was seen as a plaything for the wealthy that served no practical purpose. The car only became an important mode of transportation after very many false starts and against strong opposition.

https://en.wikipedia.org/wiki/Locomotive_Acts#Locomotives_Ac...

https://en.wikipedia.org/wiki/London_to_Brighton_Veteran_Car...

All the while with skeptics snarkily commenting "Cars can move fast, but they can't really run like a human!"

... in opposition to the car makers who want to turn everything into highways and parking lots, who really want all forms of human walking to be replaced by automobiles.

"They really cant run like a human," they say, "a human can traverse a city in complete silence, needing minimal walking room. Left unchecked, the transitions to cars would ruin our city. So lets be prudent when it comes to adopting this technology."

"I'll have none of that. Cars move faster than humans so that means they're better. We should do everything in our power to transition to this obviously superior technology. I mean, a car beat a human at the 100m sprint so bipedal mobility is obviously obsolete," the car maker replied.

I think your analogy is interesting but it falls apart because “moving fast” is not something we consider uniquely human, but “solving hard abstract problems” is

Not my analogy, parent is the one who brought up automobiles. Maybe that's who you meant to reply to.

I'm talking about the headline saying they "won gold" at a competition they didn't, and couldn't, compete in.

This metaphor drops some pretty key definitional context. If the common belief prior to this race was that cars could not beat horses, maybe someday but not today, then the article is completely reasonable, even warranted.

> Firstly, automobiles are really impressive. Second, with that out the way, these cars are not playing the same game as horses

Yes. That’s why cars don’t compete in equestrian events and horses don’t go to F1 races.

This non-controversial surely? You want different events for humans, humans + computers, and just computers.

Notice that self driving cars have separate race events from both horses and human-driven cars.

The point is that up until now, humans were the best at these competitions, just like horses were the best at racing up until cars came around.

The other commenter is pointing out how ridiculous it would be for someone to downplay the performance of cars because they did it differently from horses. It doesn't matter if they did it using different methods, that fact that the final outcome was better had world-changing ramifications.

The same applies here. Downplaying AI because it has different strengths or plays by different rules is foolish, because that doesn't matter in the real world. People will choose the option that that leads to the better/faster/cheaper outcome, and that option is quickly becoming AI instead of humans - just like cars quickly became the preferred option over horses. And that is crazy to think about.

I feel the main difference is cars can't compress time in the way an array of computers can. I could win this competition with an infinitely parallel array of random characters typed by infinite monkeys on infinite typewriters instantly since one of them would be perfectly right given infinite submissions. When I make my tweet I would pick a single monkey cus I need infinite money to feed my infinite workforce and that's more impressive clearly.

Now obviously it's more impressive as they don't have infinite compute and had finite time but the car only has one entry in each race unless we start getting into some anime ass shit with divergent timelines and one of the cars (and some lesser amount of horses) finishing instantly.

To your last point we don't know that this was cheaper since they don't disclose the cost. I would blindly guess a mechanical turk for the same cost would outperform at least today.

Considering that OpenAI's model got a higher score than any of the world's best collegiate programming teams, I'd guess that a mechanical turk would not do better (even if you gave them quite a bit of time).

In what way did the computer compress time? It completed it in 5 hours and I'm pretty sure they didn't invent a time machine

How long does a single thread take to do an attempt? How long do two threads take? I don't want to assume people reading this forum are children.

This doesn't matter. I don't intend to be rude, of course. I believe this doesn't matter at all.

Yeah I think the only thing OP was passing judgement on is on the competition aspect of it, not the actual achievement of any human or non human participant

That’s how I read it at least - exactly how you put it

I think you missed that the whole point of this race was:

"did we build a vehicle faster than a horse, yes/no?"

Which matters a lot when horses are the fastest land vehicle available. (We're so used to thinking of horses as a quaint and slow mean of transport that maybe we don't realize that for millennia they've been the fastest possible way to get from one place to another.)

> "did we build a vehicle faster than a horse, yes/no?"

Yeah fair. There's also that famous human vs horse race that happens every few years. So far humans keep winning (because it's long distance)

If you're talking about the Man versus Horse Marathon (https://en.wikipedia.org/wiki/Man_versus_Horse_Marathon) it's the other way around. Overwhelmingly the horses win. Only occasionally does the human.

I stand corrected. My memory garbled that. Thanks!

[deleted]

I was struck how the argument is also isomorphic to how we talked about computers and chess. We're at the stage where we are arguing the computer isn't _really_ understanding chess, though. It's just doing huge amounts of dumb computation with huge amounts of opening book and end tables and no real understanding, strategy or sense of whats going on.

Even though all the criticism were, in a sense, valid, in the end none of it amounted to a serious challenge to getting good at the task at hand.

There's a difference. How much money went into training the computer here Vs the human? If you want to prove that a computer can, at extreme cost and effort, beat a human - sure, it's possible.

But you can also conclude that putting a lot of money and effort pays off. It's more like comparing a horse to a Ferrari that had millions of development costs, has a team of engineers maintaining it, isn't reusable, and just about beats Chestnut. It's a long way until the utility of both is matched.

I don’t think you’ll find many race tracks that permit horses and cars to compete together.

(I did enjoy the sarcasm, though!)

Snark aside, I would expect a car partaking in a horse race to beat all of the horses. Not because it's a better horse, but because it's something else altogether.

Ergo, it's impressive with nuance. As the other commenter said.

This response is good but the more general problem is that people are in "It doesn't look like anything to me" mode like Westworld robots seeing advanced technology. If there's a way to snap people out of that, I've never seen it.

Comparing power with reasoning does not make any sense at all.

Humans have surpassed their own strength since the invention of the lever thousands of years ago. Since then, it has been a matter of finding power sources millions of times greater such as nuclear energy

Power is one thing, efficiency is another.

Humans are more efficient watt for watt than any AI ever invented.

Now if you were to limit AIs to 400 watts we could probably thinks it's fair.

> Humans are more efficient watt for watt than any AI ever invented.

Indeed they are. For now. The long term trend is not in our favor.

I disagree, the long term trend is that we do not have the available electricity and water for this to continue for much longer.

Regarding electricity, it depends on what you mean by “we”, I guess

https://www.voronoiapp.com/energy/-China-Generated-More-Elec...

Your analogy is flawed.

Are the humans allowed to bring their laptops and use the internet? Or a downloaded copy?

[dead]

The massive amounts of compute power is not the major issue. The major issue is unlimited amount of reference material.

If a human can look up similar previous problems just as the "AI" can, it is a huge advantage.

Syzygy tables in chess engines are a similar issue. They allow perfect play, and there is no reason why a computer gets them and a human does not (if you compare humans against chess engines). Humans have always worked with reference material for serious work.

Humans are allowed to look up and learn from as many previous problems as they want before the competition. The AI is also trained on many previous problems before the competition. What's the difference?

Deleted, because the "AI" geniuses and power users pointed out that Tao does not have a point. You can get this one to -4 as well, since that seems to be the primary pleasure for "AI" one armed bandit users.

It doesn't say anywhere that Gemini used any of those things at ICPC, or that it used more real-world time than the humans.

Also, who cares? It's a self contained non-human system that could solve an ICPC problem it hasn't seen before on its own, which hasn't been achieved before.

If there was a savant human contestant with photographic memory who could remember every previous ICPC problem verbatim and can think really fast you wouldn't say they're cheating, just that they're really smart. Same here.

If there was a man behind the curtain that was somehow making this not an AI achievement then you would have a point, but there isn't.

I think "hasn't seen before" is a bit of an overstatement. Sure, the problem is new in the literal sense that it does exist verbatim elsewhere, but arguably, any competition problem is hardly novel: they are all some permutation of problems that exist and have been solved before: pathfinding, optimization, etc. I don't think anyone is pretending to break new scientific ground in 5 hours.

It's not new scientific ground but a machine beating a challenging computer science problem unassisted is a big deal. If they can do that then there are a lot of other challenging things they can do.

Like what exactly? As far as I can tell, the drug discovery is fizzling out, so it's not talked about much. Toxicity, for one, is a big problem, and the AI is not going to tell you whether the new drug it just concocted is suitable for humans or not.

Small model solves an easy problem; big model solves a challenging problem. I wouldn't call those problems; they are more like invented puzzles. Perfect match for the AI marketing department to "solve".

Whatever you say.