I like the idea but I'm not so sure this problem can be solved generally.
As an example: imagine someone writing a data pipeline for training a machine learning model. Anyone who's done this knows that such a task involves lots data wrangling work like cleaning data, changing columns and some ad hoc stuff.
The only way to verify that things work is if the eventual model that is trained performs well.
In this case, scenario testing doesn't scale up because the feedback loop is extremely large - you have to wait until the model is trained and tested on hold out data.
Scenario testing clearly can not work on the smaller parts of the work like data wrangling.