Unless there is major administration change, how do things not get worse and worse from here? LLM's will only get more intelligent and be seen more of a national security risk. This brings the surveillance state deeper into every web connected device.

First I want to see them play video games at a high skill level, preferably without any access to game state beyond the same visual output that humans have access to, like a raster frame X number of times per second.

One LLM model played Factorio, albeit at a very, very poor level, which can be seen if you slow the video to 0.25 playback speed and pause frequently.

https://old.reddit.com/r/factorio/comments/1u1blr6/claude_fa...

There have been streams of other games, where LLMs and AIs have likewise performed very poorly.

I recognize that LLMs might be better at language processing than these sorts of tasks. But being able to play video games is part of general capability. And this kind of hardcore video game playing, with no access to game state, is also a general task where feigning skill can be harder. If LLMs excel at pretending to be competent without actually being competent, like this AI training approach is arguably about

https://en.wikipedia.org/wiki/Generative_adversarial_network

Then some AIs might be trained and designed for deceiving humans instead of actually being competent and capable. And thus, one response is that they should be met with more difficult tests.

Basically, make tests that AIs or LLMs will not have an easy time cheating. Hopefully, that will engender research in greater LLM/AI competence, not in greater ability to cheat or deceive, neither for LLM/AI researchers and companies, nor for LLMs/AIs themselves.

Fable can beat pokeon red in 50 minutes with only visual input. https://www.youtube.com/watch?v=Ty_50J84fMY

That is significant, but it seems to have had several issues.

> I love how it only manages to beat the game because it leveled up its Charizard to level 78. Effectively making it stronger than anything else in the main campaign. Everyone else was just filler to revive it.

> There’s a reason this is timelapsed - if you slow it down to .25x speed you’ll see it getting lost in the safari zone lol

> Deeply funny how this timeskip cuts out the 50 hours it spent grinding its shitty charmander to level 22 before Brock, skips from nugget bridge to rocket hideout, skips straight to Champion from Giovanni...really picking and choosing what to show, hey

Some comments mention how it is using strategies that young children use, like mindlessly grinding and then winning through overpowered Pokemon. Also indicates that Pokemon, at least some versions of Pokemon, is a game series that has mostly fake difficulty (fraudulent game design). But it is still impressive that it could get that far, with just visual output, since the domain in Pokemon is significantly complex, even if its world positioning is tile-based.

> For those who don't know, Claude was struggling to beat Brock one year ago in Pokemon Blue. That's considerable improvement

> @techytails18 it is impressive though it's able to finally beat the game. This kind of feels like an "answer by accident" type scenario though. I'm sure six months or a year it's probably going to be speed running it though. Doing this with no harnesses impressive.

You think the AI boys are going to let the administration keep this up for long?

Sadly yes. Sam Altman wants online ID face scanning technology just like the administration does.

To make it clearer: He's one of the founders of the company that thrives in this sort of system, World (FKA Worldcoin). People were sort of making fun of the whole company and the dystopian premise a handful of years back... But here we are. Their latest "manifesto" was posted earlier this week, called The Simple Plan.

https://world.org/blog/foundational-topics/thesimpleplan

> 1. Build a private proof of human

> 2. Launch and bootstrap the network through token ownership

> 3. Reach critical scale and initial utility

> 4. Scale further through utility and decentralize

> 5. Reach global scale and help ensure AGI benefits every human

I'd say might not have a say in this. Who knows might be that was Elon pulling the ladder after successful IPO.

Solution: get as far away as you can from these models. It is curiosity that kills the cat. If you stay away and use only open models they cannot control your work.

Rules like this would just make open models illegal for the exact same justification once they are intelligent enough.

Then you will just use chinese models running on chinese TPU made on chinese litography machine.

Then we will have ISPs block us from connecting to those machines. Then they make VPNs illegal too.

This would br like easiest way to kill US bigtech since no other country will have the same limits.

It would be more difficult to make an open model illegal. Where is the centralized kill switch like what Anthropic used today?