How many games did you have to throw away because stockfish wanted to castle? Or did you force stockfish to not castle? Castling seems like such a frequent move it is hard to draw any conclusions about the strength of an engine that does not support it.

zero games were thrown away for castling, because i forced stockfish not to castle (and not to play en passant/promotion) by filtering legal moves and only giving those filtered moves via root_moves

so every game stayed in the same no castling variant

and you're right, this rating is for that constrained variant, not full chess.

Wouldn't stockfish's position evaluation be incorrect in that case? (If it evaluated the position based on a formula that assumed normal rules)

I'm not quite clear on the how of it, but Stockfish works pretty well outside the normal bounds of chess. There are toy chess variants on chess.com with "dragons" (knight + bishop) and stockfish can use those very effectively