Hacker News

petesergeant 4 days ago [ - ]

I’ve been trying to design a puzzle for a game this year that humans can solve but LLMs can’t. I’ve come up with one, but it was hard work! It’s based around message cracking.

ekimekim 4 days ago [ - ]

There was one in a previous AoC that I think stumped a lot of AI at the time because it involved something that was similar to poker with the same terminology but different rules. The AI couldn't help but fall into a "this is poker" trap and make a solution that follows the standard rules.

sunrunner 4 days ago [ - ]

Was that 2023's Day 7 'Camel Cards' [1]?

[1] https://adventofcode.com/2023/day/7

Cpoll 4 days ago [ - ]

Isn't that easily solved by changing the terminology before giving it to the LLM?

petesergeant 4 days ago [ - ]

Interesting! Maybe that’s the general way to approach these things

gf000 4 days ago [ - ]

I mean, wasn't pretty much the second half of all AoC exercises beyond LLM capabilities?

I remember there being multiple accounts trying to one-shot AoC and all ended on day 10 or so.

fud101 4 days ago [ - ]

[flagged]

CaptainOfCoit 4 days ago [ - ]

We all have our writing quirks, like how some people use shorthand for words where there is only a marginal difference (like "people" => "ppl"), or even people who capitalize the start of sentences, but not the start of their whole text.

Some thoughts maybe should remain internal :)

dinkelberg 4 days ago [ - ]

A huge pet peeve of mine is people getting annoyed by phrases like "I mean." :)

toast0 4 days ago [ - ]

There's plenty of prior work to go on. I mean, you could use a font ligature or one of the browser extensions (although I don't know if Chrome still lets you have a browser extension touch all text).

Change ChatGPT to 'my drunk uncle' while you're at it.

marjipan200 4 days ago [ - ]

here you go, helping with exposure therapy

https://gist.github.com/clairefro/1cf81f5d7125e124975f4aba22...

snovv_crash 4 days ago [ - ]

It affects a certain disposition for the writer; the information it contains isn't in the actual data they are expressing, but rather the state of mind that they express it from, which can be important context. Oftentimes it can indicate exasperation, which is an important social queue to be able to pick up on.

A little excerpt from Arlo Guthrie

"I mean, I mean, I mean that just, I'm sittin' here on the bench, I mean I'm sittin here on the Group W bench, because you want to know if I'm moral enough to join the army, burn women, kids, houses and villages after being a litterbug."

Imagine that without the "I mean"s in it, and the importance of how they convey his stance on the situation.

TeMPOraL 4 days ago [ - ]

Since a "sentence", much like everything else in practice, is almost but not quite what the formal definition says, just use an LLM for this task.

fainpul 4 days ago [ - ]

I mean, you could just vibe code that.

GCUMstlyHarmls 4 days ago [ - ]

I mean is there a difference between asking an LLM via a prompt or asking an LLM via comment box?

Gigachad 4 days ago [ - ]

Have a look at https://arcprize.org/

They have hundreds of challenges that humans can solve in under a minute which LLMs can not. Seems the general trend is figuring out the rules or patterns of the challenge when there are few examples and no instructions.

mewpmewp2 4 days ago [ - ]

Perhaps coding exercises that require 2d or 3d thinking, or similar. This is where I have seen LLMs struggle a lot. There are probably other areas too.

petesergeant 4 days ago [ - ]

Ah, it also needs to be challenging for humans. It's a prize to win something. I just didn't want people to throw the question into Claude Code.

LPisGood 4 days ago [ - ]

For more examples of such problems check Jane street puzzles of the month

petesergeant 3 days ago [ - ]

Those will almost certainly be too hard for the target audience

loeg 4 days ago [ - ]

Just have to incorporate good judgement in some way.

huflungdung 4 days ago [ - ]

How many <$letter>s are in the word <word with $letters>

Crespyl 4 days ago [ - ]

The bigger LLMs have generally figured out this specific problem.