“Brewer Ratings” on RateBeer are Broken–And They Have Been for a While
By their own admission, one of the site's ratings isn't working correctlyDrink Features RateBeer
Update: Since this story was first published, the “Brewer Ratings” feature on RateBeer has been eliminated entirely.
Last week, I was contacted by a Paste reader with something very interesting and unusual to share about RateBeer. It’s safe to call this person, a criminology professor with a background in quantitative methodology, something of a statistics geek. In the wake of the revelation that RateBeer had received investment/a partial buyout from AB-InBev back in Oct. 2016 without disclosing that investment to media or breweries, he wondered how statistics on RateBeer may have been affected since the autumn, so he set about to doing some research via the Internet Archive’s Wayback Machine.
What he found was deeply concerning, and seemed to give credence to the fears of craft brewers such as Dogfish Head’s Sam Calagione, who along with others publicly requested their breweries be removed from the RateBeer service due to the investor’s obvious conflict of interest.
In Sept. of 2016, RateBeer introduced a feature called “Brewer Rating,” which appears on every brewery’s profile page. The number is meant to be an at-a-glance indicator of how well that brewery’s full lineup of beers has rated; the purest and most simple condensation of which breweries are “good” and “bad” for an average user who isn’t going to deep-dive among the entire lineup of brews. When the system was introduced, and at least into Oct. of 2016, the Brewer Rating of Anheuser Busch-InBev stood at 74/100. Not exactly rosy.
AB-InBev, back when their rating was a “C” letter grade.
Today, that number stands at 90/100. In the interim, only a superficially small number of new ratings have been added to the brewery—1.3% more reviews, to be precise—which is not nearly enough to reasonably expect them to influence the score from 74 to 90. It at first appears to be a significant 21.6% increase in the overall Brewer Rating, but because the actual range of the scale is 50-100 rather than 1-100 (for reasons we’ll explain shortly), 50 being the lowest possible score a brewery can have, the relative percentage of increase in Brewer Rating is even higher. Taking into consideration that this is actually a 50-100 scale, it’s as if the score changed from 24/50 to 40/50, which is a 66.7% increase since October—which is of course also when AB-InBev happened to invest in RateBeer.
AB-InBev, now a respectable A-minus.
So yeah, it’s profoundly easy to see why this reader was suspicious, and why we quickly became suspicious as well. But as we dove into the numbers of more breweries, we began to see other things that also didn’t seem to make sense, unrelated to AB-InBev. One of the best examples is MolsonCoors, the Canadian brewing giant, which went from a Brewer Rating of 68/100 in Oct. of 2016 to a 93/100 now. In their case, a mere 1.4% increase in the total number of beer ratings somehow produced an insane 138.9% increase in total Brewer Rating. That one is straight-up statistically impossible.
Back in Sept., MolsonCoors was batting a miserable 68/100.
According to RateBeer founder/executive director Joe Tucker in a forum post when the feature was introduced, Brewer Rating is meant to be calculated as (percentile/2 + 50), where “percentile” refers to the percentile of scores into which the brewery’s full range of beers fall. Which is to say, if a brewery’s full lineup of beers put it in the 90th percentile, then its final Brewer Rating will be 95/100. It’s a pretty sensible system, even if by definition it means that the lowest rating a brewery can achieve would be 50—the score it would get if its beers were in the 0th percentile. If you’re wondering, there do apparently exist at least a few breweries out there with 50/100 scores, although that may be tied to having very few ratings.
This left us with some burning questions. What happened since Oct. to raise the Brewer Ratings of AB-InBev and a few other breweries by such large amounts? And why do some companies such as AB-InBev have significantly higher Brewer Ratings than craft breweries whose entire lineups of beers have higher scores than AB-InBev’s product?
There only seemed to be three likely answers. Either:
1. Some kind of active collusion was going on, which seems pretty unlikely given the way such dramatic changes would be noticed by users and media—although to be fair, no one but us apparently brought this to RateBeer over the course of almost 9 months. Or:
2. The system is working as intended, and no one, including our friend with the quantitative methodology expertise, properly understands the math. Or:
3. Something to do with the Brewer Rating system is really, really badly broken, and it has been for quite a while at this point.
With an array of questions I reached out to Joe Tucker at RateBeer for comment, requesting among other things, a full example of the math that would make us understand how AB-InBev ends up with a 90/100 rating. And surprise: Turns out these numbers are all kinds of screwed up.
Here’s the reply I received from Joe Tucker, after bringing these issues to his attention and making it clear that Paste was preparing a story about the strange Brewer Ratings on RateBeer.
Thanks for the heads up. We hadn’t received a notice about this from anyone inside or outside the community. The current scores, as presented, are clearly in error. I checked the code and it appears that it refers to data that we no longer update and there are issues with the presentation of the score itself. It appears that it broke, like a few other things, during our transition to new servers, and an SEO update to make our scores compatible with Google data formats ranges, and refers to calculations made about the same time.
There is no code or calculation that specifically benefits AB InBev. All brewers were also positively affected by the same failure.
We’ll be making updates to fix brewer scores presentation in a coming code update.
You’ll have to allow me a reasonable baseline of incredulity here, because of all possible answers, the one I was least expecting to hear in response to “Is this shit broken?” was “Yes, it’s totally broken!” It’s a little difficult for me to believe that RateBeer’s employees, coders and tech people would never have noticed a universal, site-wide alteration of all the Brewer Ratings outside of anything they intended, which resulted in changing the AB-InBev score, among others, from a “C” to a respectable “A -”. And keep in mind, this almost certainly isn’t something that happened in the last few weeks—all the references I can find suggest that these numbers have been wrong for many moons. It’s unfortunate that the Internet Archive’s Wayback Machine crawls these pages so infrequently, because if we had a more robust set of reference points, we would be able to see exactly when the changes happened.
It also seems weird that the site’s own userbase would not have noticed/brought up these same issues. I tried to look through RateBeer’s forums in the hopes of finding if there had been any threads about Brewer Rating, only to be shocked to find that there somehow isn’t a search function in the RateBeer forums—which seems impossible, but is somehow true. The top google result for “search RateBeer forums,” by the way, is a 13-year-old post from 2004 with a guy asking why there isn’t a search function in the forums. How one locates previously discussed topics, I have no idea.
At the same time, though, Tucker’s answer does perhaps explain one of the other aspects of Brewer Rating that doesn’t seem to make much sense: They tend to be overwhelmingly high, especially for breweries with a lot of ratings. Go look up some of your favorite breweries, and as long as they’re a decent place with a good number of ratings, chances are excellent that they’re currently sporting a 100/100 rating. The more you look at them and see scores of 98 to 100, the more you can’t help but wonder what the function of Brewer Rating is to a user, because it becomes an “If everyone’s super, nobody will be” situation.
Still, at the very least, we can take one thing away from the response: Brewer Rating is currently broken on RateBeer by their own admission, and the site’s executive director says they were unaware that the problem existed. He also says it will be somehow fixed in an upcoming update. Does this mean that AB-InBev will be back to a 74/100? That will be interesting to see.
Tucker’s reply about the errors in Brewer Rating hinges around the following lines: “There is no code or calculation that specifically benefits AB-InBev. All brewers were also positively affected by the same failure.”
This implies that the failures caused scores to rise across the board, but from my own examination of various Brewer Ratings past and present, I’m not seeing much clear evidence of this. In some cases, craft breweries still have lower Brewer Ratings than AB-InBev or MolsonCoors, despite the fact that they produce entire lineups of beers with better individual ratings. Here are a couple of examples.
Wynwood Brewing Co., Miami, FL
Wynwood’s current Brewer Rating is 87/100, which sounds pretty good right up to the point that you realize they’re still trailing AB-InBev by three points. And yet, their most popular beers, and their beers with the highest numbers of ratings, are all better rated than the most highly rated beer of significance from AB-InBev, which is the classic English pale ale, Bass.
Bass has a score of 3.11 on RateBeer. It’s the highest-rated beer of consequence on AB-InBev’s page, and presumably has what is by far the most important positive effect on their Brewer Rating. But Wynwood, on the other hand, has 13 beers rated more highly than Bass, which includes all but one of the brewery’s most popular beers. In fact, almost 60% of all of of Wynwood’s ratings come from beers with average scores higher than 3.11. Meanwhile, a whopping 95.4% of all AB-InBev ratings are for beers with averages below 3.11—usually far, far below. In fact, on RateBeer’s own list of the The Worst Beers in the World, AB-InBev produces 15 out of the 50 entries, and they’re all SUPPOSED to be factoring into its score. They even hold the distinction of making both the #1 and #2 worst beers in the world according to RateBeer’s data: Natural Ice and Natural Light. But they’re still beating Wynwood regardless, even with the data error supposedly affecting both breweries.
BlueTarp Brewing Co., Decatur, GA
Decatur, GA’s BlueTarp Brewing Co. is an even clearer illustration of this point. Their current Brewer Rating is 84/100, but almost every single beer they produce has higher ratings than the AB-InBev product (Bass, once again) that is rated most highly. They may not have the highest pure scores of every brewery in the Atlanta area, but what they can say is this: 96.9% of all BlueTarp ratings are for beers with higher average scores than the highest rated beer AB-InBev is producing today. But yet, their Brewer Rating is lower, and in this one metric they’re being beaten out by the makers of Natural Light.
The highest-rated beer AB-InBev currently produces is rated at 3.11. Every one of these BlueTarp beers is better rated, but their Brewer Rating is still less than AB-InBev’s.
Yuengling’s current Brewer Rating is a pretty mopey 70/100. There are no available archived versions of the page since the introduction of Brewer Rating in the fall, but it would certainly appear that they received no bump (or a much smaller bump) in terms of their score when RateBeer made its site changes/SEO updates. And although Yuengling Traditional Lager is hardly a critical ratings darling, its 2.78 rating is a far cry better than the AB-InBev products with the #1 (Budweiser) and #2 (Bud Light) biggest numbers of ratings. Those two beers stand at 1.47 and 1.22, respectively. But still, AB-InBev has a 20-point lead on Yuengling in Brewer Rating.
I was curious what other sorts of data points might be gleaned from finding as many before-and-after samples on the Wayback Machine as possible, so I began to search. Starting with the Brewers Association’s list of the 50 largest craft breweries of 2016, I entered about 100 brewery pages into the Wayback Machine, looking to see if I could find any other significant jumps. And in all the ones I entered, I couldn’t find any. Much of this is simply because most brewery pages don’t have archived versions available, but it’s worth noting: In all that searching, I couldn’t find any craft brewery with an archived page whose score changed by more than 2 points in the last 9 months. In fact, most of the ones currently sitting at 100’s (and I remind you, a TON of breweries on RateBeer have 100/100 scores) had previously been at 100 already, at least for months and possibly since the very beginning of Brewer Rating’s existence. All of the archived pages referred to below are the closest available archived page to Sept. 2016.
– Deschutes is rated 100/100, and was in Feb. as well
– Dogfish Head is rated 100/100, and was in Jan. as well
– Lagunitas is rated 100/100, and was in March as well
– Great Lakes is rated 100/100, and rose from 98/100 in Oct. of 2016
– Surly is rated 100/100, and was in Oct. of 2016 as well
– Full Sail Brewery is rated 100/100, and was 99/100 in Nov. of 2016
– Flying Dog is rated 100/100, and was in Oct. of 2016 as well
– Allagash is rated 100/100, and was in Nov. of 2016 as well
– Epic Brewing Co. is 100/100, and was in Feb. as well
– Bear Republic is 100/100, and was in Oct. of 2016 as well
Even among the breweries with archived pages from Oct. and Nov. of 2016, there were no significant increases, because their Brewer Ratings were sky high from the start. So, Tucker can say that “all Brewer Ratings on RateBeer were increased,” but it seems meaningless when most of them were already maxed out on the scale to begin with. Except, that is, for the breweries that saw the most dramatic and advantageous increases … Big Beer producers such as AB-InBev or MolsonCoors. As the ratings are currently calculated, they’re the ones benefiting the most, because it makes it appear to a casual observer as if their beer is “on the same level” as well-liked craft brewers, or so close as to make no difference. You see 90/100 or 93/100 and think “Well, that’s a pretty good score.” And this is presumably how AB-InBev would prefer for it to stay, if they had a say in the matter.
With some of these questions in mind, I again reached out to Tucker for more clarification. His reply is below.
In 2016, we reimplemented the display of the brewer rating to improve our traction with Google search. One aspect to the design was to make brewer scores compatible with the typical range of Google results. As I mentioned, all brewer ratings on RateBeer were increased as a result of this change.
We are planning on doing another analysis of the brewer scoring system in the future. We want to make sure the calculation continues to accurately represent the community’s opinion.
To answer your questions … It is hard to unpack individual cases given the number of changes, but to address the formula, a brewer score is not a simple average of total scores from reviews. It’s the result of weighted performances on a variety of criteria which include performance summaries of the last year and entire record, inclusion of products in the top ranked overall and by style, extra-regional validation, etc. As we have done at RateBeer for many years, we calculate and keep a beer’s real mean but rely on a Bayesian weighted means for most beer and brewer summaries. Therefore, a beer with more reviews will have a weighted average closer to its real average than one with fewer.
The brewer rating as shown on the brewer’s individual page is a Google-friendly translation of the brewer score based on its percentile rank.
So wait … did he just more or less say that the real reason for the existence of Brewer Rating is simply to help the site’s SEO (Search Engine Optimization)? It’s starting to sound like the users weren’t really meant to even take notice that these scores existed at all, but they’re on every brewery page. If you’re a regular user, you can’t help but come to notice them. And if you look a little closer, you can’t help but say something to the tune of “Why is the company that makes 15 out of the 50 worst beers in the world a 90/100 on this scale?”
It appears that RateBeer is now aware of these problems—or at least aware that other people are aware of these problems, which may be functionally just as good. Tucker’s statement that “We are planning on doing another analysis of the brewer scoring system in the future,” and especially “We want to make sure the calculation continues to accurately represent the community’s opinion” seems like a pretty clear admission that the calculations, are currently configured, do not represent the community’s opinion. It seems reasonable to expect that changes will be made in the near future.
Personally, I hope that once those changes are made, it will result in a series of Brewer Ratings that better reflect the full range of breweries—not just a reduction in Big Beer pages such as AB-InBev. Yes, it’s absurd that they’re a 90/100, but it’s no less absurd how many breweries are all bunched up at 100/100. I see no conceivable way that the beer scores possessed by Jester King/Russian River and the scores possessed by Samuel Adams should possibly BOTH result in the same 100/100 rating. No offense to Boston Beer Co., but we’re talking about different tiers, here. If they’re a 100/100, then you need to invent a 110/100 for Jester King. Perhaps the new version of the scale could actually use the full range of the scale? We’re just spitballing, here.
But any version of the scale where Miller Brewing Co. (who we haven’t mentioned at all) doesn’t look like this would probably be an improvement.
In case you were wondering, MillerCoors also holds a 90/100 Brewer Rating. They make 13 out of RateBeer’s 50 worst beers in the world, across the full Miller and Coors lines.
We’ll be keeping an eye on RateBeer to see where they go from here.
Jim Vorel is a Paste staff writer and craft beer lover. You can follow him on Twitter.