Just a thought: This data engineering can only really occur in sciences with a significant "moat".
Expensive tools, expensive test setups, live, gene-altered animals, etc.
In fields such as deep learning or other more digital fields (my field is using a lot of freely available satellite data) replication is often cheaper and actual application of research outcomes is a lot more common.