Hey HN!
Last week, Joe Barrow released CommonForms [1], a set of open models for automatically detecting form fields in PDFs.
He trained two models, FFDNet-S and FFDNet-L, on a dataset of 55k documents. You can read more about his approach in the arXiv paper [2].
As someone who's been searching for reliable models to auto-detect form fields (one of the last hard problems in PDF form filling), I was seriously impressed by the quality of these models. I wanted to give them the attention and distribution they deserve, so I created a fully browser-based implementation that handles both detection and field addition.
My implementation relies on his models and onnx runtime web + some post-processing. I plan on publishing a small browser library to encapsulate it in the coming days to make it easier to deploy anywhere (currently you'd have to fork / copy my code)
Happy to answer any questions about the browser-based implementation!
Questions about the models themselves should be directed to Joe, who I believe is also on HN [3]
[1] https://github.com/jbarrow/commonforms [2] https://arxiv.org/abs/2509.16506 [3] https://news.ycombinator.com/user?id=jbarrow
Hey, Benjamin, thanks for the attribution! Happy to field any questions HN users have.
It's really gratifying to see people building on the work, and I love that it's possible to do browser-side/on-device.
Tbh this model is extremely bad. I tried a couple of our medical form examples and it couldn't find almost any of the fields.
Super interesting. Would you be willing to try the Python package (https://github.com/jbarrow/commonforms) or share the PDFs?
For the non-ONNX models there are some inference tricks that generally improve performance, and potentially lowering confidence could help.