Looking at the post, the methodology (checking the presence of a few keywords, not even including "LLM") is very simple with a lot of potential false negatives. 20% is closer to the lower bound.
It could also explain the lower votes, "AI" or "GPT" being more generic terms is correlated in my personal experience with lower quality.
Ironically would have been better to use LLM to classify posts.
My thoughts exactly. Or at least curate a sample of true positives and use zero shot classification with semantic embeddings + keywords.
20% is closer to the lower bound
I think the methodology is too poor to actually make that call. It's closer to "the 20% is kind of made up".