We do understand tho, it is exactly what they were made for.

If you train it on a dataset of Othello games, or a dataset including these, you are basically creating a map of all possible moves and states that have ever happened, odds of transitions between them, effective and un-effective transitions.

By querying it, you basically start navigating the map from a spot, and it just follows the semi-randomly sampled highest confidence weights when navigating "the map".

And in the multidimensional cross-section of all these states and transitions, existence of a "board map" is implied, as it is a set of common weights shared between all of them. And it becomes even more obvious with championship models in Othello paper, as it was trained on better games in which the wider state of the board was more important than the local one, thus the overall board state mattered more for responses.

The second research you linked is also has a pretty obvious conclusion. It's telling us more about us as humans than about LLM's, about our culture and colors and how we communicate it's perception through text. If you want to try something similar, try kiki bouba style experiments on old diffusion models or old LLM's. A Dzzkwok grWzzz, will get you a much rougher and darker looking things than Olulola Opolili's cloudy vibes.

The active research is as much as:

- probing and seeing "hey lets see if funky machine also does X"

- finding a way to scientifically verify and explain LLMs behaviors we know

- pure BS in some cases

- academics learning about LLM's

And not a proof of where our understanding/frontier is. It is basically standardizing and exploring the intuition that people who actively work with models already have. It's like saying we don't understand math, because people outside the math circles still do not know all behaviors and possibilities of a monoid.

@hypendev I am not trying to start a flame war, but let me take a very simple example.

As another one put it, we know how to build deep-learning machines. No question about that. My statement is that we don't understand clearly why they output the observed results.

Let's imagine that you have a model that can detect cats on an image, with 95% accuracy. If you understood how the model worked, I could give you an image of a cat and you could _predict_ reliably if the model would detect the cat.

Yet, we are not able to do that: you have to give the image to the model to observe the result. We can't predict reliably (i.e. scientifically) the result and we don't know how to better train the model to detect the cat without altering the other results. (Of course including the test image in the training set is forbidden).

Back to LLM: we can't predict how they will behave. Therefore, even world-class scientists at OpenAI, knowing about a Goblin issue and making assumptions about the cause, are not able to edit the model directly to fix it. They would if they understood it fully. But they are reduced to test-and-hack their way through.