The ground truth comes from manual work. The scrolls can be unwrapped virtually, manually, through extensive pointing and clicking by a human on the boundaries of the scroll. This, in and of itself, is not particularly hard in sections of the scroll that are preserved well, but is extremely tedious and slow and error prone. We have a team of annotators who do manual annotation and refinement through custom software we've written, mostly improving on automatically generated segmentations and unwrappings.

Once you have some unwrapped papyrus, you can render it to an image and look for ink. Ink leaves a certain texture that can be identified by the naked eye and labeled. Between these two processes you get the segmentation and ink detection ground truth. Segments can be flattened virtually through existing software and algorithms.

I'm sure that process is described somewhere on the project's site and, being a lazy human (and unwilling to ask LLMs to summarize it for me), I leaned on you for a human answer. I really appreciate you taking the time to answer. Thank you.

I can see why you'd be attracted to this project from a "let's solve problems computationally" perspective (never mind the historical side). It sounds like there are some cool problems in there.

The eye toward automating the process that the project seems to be targeting is particularly cool, too. This kind of stuff that makes me have real enthusiasm for ML.