These datasets would definitely have a lot of Text => Sketch pairs as well. I wonder if its possible to extrapolate from Text => Sketch and Text => Image pairs to improve Sketch => Image capabilities. The models must be doing some notion of it already.