Neat problem to work on. The tail number lookup is the hard part and it sounds like you solved it the right way, by finding the people who actually track this obsessively rather than trying to scrape it yourself.
Two questions: how stale does the tail assignment data get in practice, and do you have a way to detect when an enthusiast spreadsheet goes unmaintained? And what happens to your probability estimate when an airline swaps aircraft last minute, which seems to happen pretty often on regional routes?
Great questions!
> how stale does the tail assignment data get in practice, and do you have a way to detect when an enthusiast spreadsheet goes unmaintained?
These are updated almost every day so far, so they seem very up-to-date. Internally we track all changes/removals, so I'm not that worried about spreadsheets being abandoned yet. It's a good thought though.
> And what happens to your probability estimate when an airline swaps aircraft last minute, which seems to happen pretty often on regional routes?
Honestly our estimate right now is pretty crude. At the scale we're at right now it works, but I think you're right that we could make this more accurate by tracking equipment swaps & really drilling into the details of which aircraft get assigned to which routes.