Would you agree/disagree with the following:

- It's not gradient boosting per se that's good on tabular data, it's trees. Other fitting methods with trees as the model are also usually superior to NNs on tabular data.

- Trees are better on tabular data because they encode a useful inductive bias that NNs currently do not. Just like CNNs or ViTs are better on images because they encode spatial locality as an inductive bias.

Absolutely agree on both counts. Gradient boosting is the most commonly known and most successful variant, but it's the decision tree structure that is the underlying architecture there. Decision trees don't have the same "implicit training bias" phenomenon that neural networks have though, so all of this is just model bias in the classical statistical sense.

Can NNs be made to be better than trees on tabular data with some further constraints, or something?