The conditional relation represented in prolog, and in any deductive system, is material implication (~PvQ), not causation. You can encode causal relationships with material implication but you’re still going to need to discover those causal relationships in the world somehow.

Conditional statements don't really work because "if A, then B" means that A is sufficient for B, but "A causes B" doesn't imply that A is sufficient for B. E.g. in "Smoking causes cancer", where smoking is a partial cause for cancer, or cancer partially an effect of smoking.

"A causes B" usually implies that A and B are positively correlated, i.e. P(A and B) > P(A)×P(B), but even that isn't always the case, namely when there is some common cause which counteracts this correlation.

Thinking about this, it seems that if A causes B, the correlation between A and B is at least stronger than it would have been otherwise.

This counterfactual difference in correlation strength is plausibly the "causal strength" between A and B. Though it doesn't indicate the causal direction, as correlation is symmetric.

I didn't say one does not to discover the causal relationships, but once discovered, such relationships can be explored and followed and _inferred_ on in a very syllogistic manner. My comment was really about the proposal in the article.

On the other hand, what we seem to have with LLM models, and the transformer approach in particular, is a sort of probable statistical correlation, calculated by brute-forcing and approximation (the gradient descent). So this is not true causation also, it becomes one only after a human observes it and agrees it follows certain causality.

/Not sure whether I can state that it is also material but in another non-logical sense, perhaps would sound nonsensical, but the apparent logical structure in the LLM production rather emerges from training patterns, not from explicit logical operations./

There's nothing wrong having a graphical structure which models causality, and of course - this needs to be discovered first. But then we have LZW/Sequitur using very brute-force way in order to find the minimal grammar for compressing certain data lossless-ly, thus discovering some logical structure (and correlation), but this is not yet causation. Indeed finding patterns != finding causal relationships.

My gut feeling is we want something that would result in a correct PROLOG-like set of inference rules, but based on actual causality, not conflating correlation. And then this - for a larger corpus - world's knowledge, but we don't have the means (yet) to figure out the correlation, even though approaches exist for smaller corpus.

It is perhaps the gradient descent and the fact that this composition of tensor algebra is differentiable that is the ingenious thing about the ML we deal with now, but everyone is dreaming of some magic algo which would allow finding the causation so that it results in non-probabilistic graphical model, or at least a model that we can follow the stochastic branching on in a observable manner.

It is indeed ingenious to fold multi-dimensional spaces, multiple times, in order to disambiguate the curvature of bunny's ear from the one of bear's ear. But it just does not feel right to do logic and causation by means of differential calculus and stochastic structures.