How noisy is the sentiment classification? Feels like that could skew results a lot

From the comments that I've checked manually it's pretty good. You can go to the "User Ratings" tab in the Google Sheet and check some comments to get an idea.