Hacker News

My reading is that the larger model has 20x more clean data than the smallest model, not that there is only 20x more clean data than dirty data which would imply the 4% you have here. I agree it could be worded more clearly.