Training ML models for PDF forms. You can try out what I’ve got so far with this service that automatically detects where fields should go and makes PDFs fillable: https://detect.semanticdocs.org/ Code and models are at: https://github.com/jbarrow/commonforms

That’s built on a dataset and paper I wrote called CommonForms, where I scraped CommonCrawl for hundreds of thousands of fillable form pages and used that as a training set:

https://arxiv.org/abs/2509.16506

Next step is training and releasing some DETRs, which I think will drive quality even higher. But the ultimate end goal is working on automatic form accessibility.

Congratulations on being featured in the Superhuman newsletter. Trying it out.