There are many priors to encoder-free VLMs. I specifically remember the EVE series of models from ~2 years.
https://github.com/baaivision/EVE