I’ve been trying to design a puzzle for a game this year that humans can solve but LLMs can’t. I’ve come up with one, but it was hard work! It’s based around message cracking.
I’ve been trying to design a puzzle for a game this year that humans can solve but LLMs can’t. I’ve come up with one, but it was hard work! It’s based around message cracking.
There was one in a previous AoC that I think stumped a lot of AI at the time because it involved something that was similar to poker with the same terminology but different rules. The AI couldn't help but fall into a "this is poker" trap and make a solution that follows the standard rules.
Was that 2023's Day 7 'Camel Cards' [1]?
[1] https://adventofcode.com/2023/day/7
Isn't that easily solved by changing the terminology before giving it to the LLM?
Interesting! Maybe that’s the general way to approach these things
I mean, wasn't pretty much the second half of all AoC exercises beyond LLM capabilities?
I remember there being multiple accounts trying to one-shot AoC and all ended on day 10 or so.
[flagged]
We all have our writing quirks, like how some people use shorthand for words where there is only a marginal difference (like "people" => "ppl"), or even people who capitalize the start of sentences, but not the start of their whole text.
Some thoughts maybe should remain internal :)
A huge pet peeve of mine is people getting annoyed by phrases like "I mean." :)
There's plenty of prior work to go on. I mean, you could use a font ligature or one of the browser extensions (although I don't know if Chrome still lets you have a browser extension touch all text).
Change ChatGPT to 'my drunk uncle' while you're at it.
here you go, helping with exposure therapy
https://gist.github.com/clairefro/1cf81f5d7125e124975f4aba22...
It affects a certain disposition for the writer; the information it contains isn't in the actual data they are expressing, but rather the state of mind that they express it from, which can be important context. Oftentimes it can indicate exasperation, which is an important social queue to be able to pick up on.
A little excerpt from Arlo Guthrie
"I mean, I mean, I mean that just, I'm sittin' here on the bench, I mean I'm sittin here on the Group W bench, because you want to know if I'm moral enough to join the army, burn women, kids, houses and villages after being a litterbug."
Imagine that without the "I mean"s in it, and the importance of how they convey his stance on the situation.
Since a "sentence", much like everything else in practice, is almost but not quite what the formal definition says, just use an LLM for this task.
I mean, you could just vibe code that.
I mean is there a difference between asking an LLM via a prompt or asking an LLM via comment box?
Have a look at https://arcprize.org/
They have hundreds of challenges that humans can solve in under a minute which LLMs can not. Seems the general trend is figuring out the rules or patterns of the challenge when there are few examples and no instructions.
Perhaps coding exercises that require 2d or 3d thinking, or similar. This is where I have seen LLMs struggle a lot. There are probably other areas too.
Ah, it also needs to be challenging for humans. It's a prize to win something. I just didn't want people to throw the question into Claude Code.
For more examples of such problems check Jane street puzzles of the month
Those will almost certainly be too hard for the target audience
Just have to incorporate good judgement in some way.
How many <$letter>s are in the word <word with $letters>
The bigger LLMs have generally figured out this specific problem.