Hacker News

If the model outputs “yes” or “no” before giving its CoT explanation then the explanation is worthless for improving the final decision.

There are imperfect correlations in both stream directions, but forward is probably more causal, as you say.