> "On May 18, 2026, Ukraine carried out a drone attack on Moscow, Russia"

I actually don't know which way you came down on that one?

I think strictly it's false but "mostly true" would be justifiable? (as in, to say it's false would be misleading if it lead the reader to assume there was no attack around that time).

https://www.washingtonpost.com/world/2026/05/17/ukrainian-dr...

It seems it happened Saturday 16th overnight into the 17th, not the 18th. I see this a LOT with fact checking. It shouldn't be this way, but political bias seems to nudge people into making calls land one way or the other with selective application of pedantry.

It's impossible to answer if you don't have a search tool, and three out of the five tested models didn't have a search tool.

Thanks; I didn't spot that they disabled tools in the harness. Also they don't provide an "out" to allow the models to express uncertainty so the instructions force a guess to be made.

As an aside though it's still funny that the two tools WITH search also disagreed.

It's impossible to answer unless you have a *100% complete search tool*.

No sytem can know everything. It doesn't matter how many tools you give it. It's always wrong to force binary True / False without shades of "I don't know"

That's ten days ago. As the commenter pointed out, without a web search tool there's no possible way for the model to know whether it's true or not, and the people conducting the study didn't give the models a way to respond with "I don't know".

It's not in the training data, so there is no way for the model to know.