I don't see why the transformer architecture can't be designed and trained with separate inputs for control data and content data.
I don't see why the transformer architecture can't be designed and trained with separate inputs for control data and content data.
Give it a shot