Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.
For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.
Yeah, maybe. But what matters is the end result. In the kaggle match, one of the games from the finals (grok vs o3) is rated by chesscom's stockfish as 1900vs2500. That is, they played a game at around those ratings.
For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.
I believe it will make illegal moves (unaided by any tools ofc). It will also make mistakes doing things like not being able to construct the board correctly given a fen string. For these reasons I consider long strings of correct moves insufficient to say it can play the game. If my first two statements, about a propensity for illegal moves and other fails on "easy for humans" tasks were untrue then I would reconsider.
In the kaggle test they considered the match forfeit if the model could not produce a legal move after 3 tries (none of the matches in the finals were forfeited, they all ended with checkmate on the board). Again, chesscom's interface won't let you make illegal moves, and the average there is 900. Take that as you will.