On Home Scoring, the Elite Kind

The official motto of the American gymternet is Illos numeros inter gentes numquam accipiet. (She’ll never receive those scores internationally.)

Over the years, we’ve all seen any number of hilarious scores showered upon US gymnasts at domestic competitions, and this storied history of ridiculing hyper-American judging has cultivated the widespread assumption that less biased international judges would never succumb to such silliness.

At times in the past, this has been the case, but in the last few years, the international judges have seemed willing to evaluate the execution of routines with that we would normally consider an American lens. Has the “she will never receive those scores internationally” argument become a knee-jerk response to perceived overscoring without a strong correlation to fact? Will she probably receive those scores internationally as well?

Those are my questions, at least. So, as a way of reintroducing myself to the elite world, which at this point has been [scene missing] ever since the Olympics, I compared the execution scores given to the US team members in 2011 and 2012 at Classic/Championships/Trials with the scores they would later receive at Worlds/Olympics. (I went back no further than 2011 because I don’t have the D/E breakdown for 2010 TF and AA and 2009 AA.) I threw out routines with falls and major mistakes, since they would skew the execution score in a misleading direction. This is about the evaluation of essentially equivalent routines, not comparing falls to hits. Certainly, there are differences in the actual quality of all routines (a wobble or two here, no wobbles there), but those issues should even out between the two sets of competitions, providing an overall reliable sense of how the judges are evaluating American performances.

At Worlds in 2011, things were fairly regular and predictable over the four events with the average US execution scores falling somewhere between a tenth to a tenth and a half lower than the scores received domestically for hit routines. It’s a significant but not overwhelming or decisive difference. At the Olympics, things got a little funkier. The execution scores on bars were significantly lower than what was received at Championships and Trials (three or four tenths), and the floor scores were somewhat lower as well (more in line with what we saw in 2011). However, the beam execution scores were quite constant throughout all competitions, and the vault scores were much higher at the Olympics than in the US. The outlying increase in numbers on vault in 2012 can, at least in part, be attributed to legitimate improvement in execution leading up to the Olympics from the likes of Douglas and Raisman, whose Y2.5s were far stronger by that point.

I would contend, though, that on a number of occasions the vault scoring at the Olympics went a little Florida @ Utah. 9.400? 

That’s my shallow overall impression of the numbers. There is enough of a difference between domestic and international scoring to remain significant, but it is not dramatic and can easily be overstated. In almost all cases recently (and we’ll get to that almost in a moment), the overall execution scores received in the US have not been outside a believable range with the international scores. If we go back to 2009, the Worlds infamous for harsher execution scoring, the difference would have been greater, but in the years since, the scores have adjusted to a more forgiving place.

While those are the larger trends, they are far from consistent from gymnast to gymnast, and that’s where things get interesting. The average difference in 2011 may have rested in that 1-2 tenth range, but some were consistently on the low end (or below the range) while others were always over the range. This in and of itself is not necessarily surprising, but I would have assumed that the difference would be greater for gymnasts with more questionable form issues, ones that might be ignored domestically and caught internationally. Translation: I thought it would be Aly Raisman.

Raisman received more “she’ll never get those scores internationally” comments than any other gymnast of the quad, and yet of everyone, she was the gymnast least susceptible to knock-down execution scores at Worlds/Olympics. She did get those scores internationally, and she got them every single time. The justification of those scores is another question, but they happened. Raisman’s execution scores were almost always within a tenth either way. Her worst differential was on vault in 2011, where her US execution average was +.125 over her World average (if we throw out the Amanar attempt from Classic, which is not comparable to the execution scores for her DTYs at Worlds – if we put it back in, her Worlds average is better). On floor in 2011 and vault in 2012, her international averages were far stronger than her domestic ones. Even on her much-maligned bars routine, there was no appreciable difference between US and international evaluation. The international judges felt pretty good about her bars work.


While Aly Raisman was a steady little tugboat, Jordyn Wieber was quite the opposite. On both the 2011 and 2012 teams, she had by far the most disparate execution scores. If Raisman’s difference rarely reached above a tenth, Wieber’s difference rarely reached below two tenths and was usually greater than that. It was upwards of five tenths on both bars and floor in 2012 (where Raisman was closer to one and Douglas was closer to two). Even on vault in 2012, the scoring jackpot, her increase was the smallest on the team (along with Maroney’s, which I would account more to hitting a ceiling and having nowhere higher to go). Over two years of senior (major) international competition on bars, Wieber received consistent E scores between 8.700-9.000 at home and never broke 8.500 internationally. On floor, she was almost always 9.000 or greater at home yet only once reached that plateau on the big stage (for that rather strong 2012 TF floor routine, which still scored significantly lower in execution than all her domestic routines in the lead up to the Olympics, which didn’t go below 9.150).


The only event where Wieber’s execution evaluation was fairly constant was beam, but I would certainly make the argument that home scoring did her the greatest disservice on beam in 2012 by pretending that her 6.3 composition existed in a real world. Home scoring with regard to D score is certainly an issue, but it comes into play less frequently. D score issues have not usually had the kind of significant impact they had with Wieber.

So, what do we make of this? As with all quantitative assessments, it should exist alongside qualitative assessments. An argument can certainly be made that Wieber peaked early and simply showed better gymnastics domestically, accounting for the major difference in scores. That is absolutely part of it, but does that account for the whole difference? In 2011, the bars difference was three tenths even when throwing out the routines with mistakes like the AA performance, and the evidence of 2012 FX seems to show that even strong international performances were never going to match the domestic scores. That just didn’t happen to the same degree with the rest of the team, particularly Raisman. It wasn’t a whole-country thing. It wasn’t even a just-the-famous-ones thing. 

These last few years, it appears that home scoring is alive and, while maybe a little bit more feeble than it used to be, still kicking. However, “she’ll never receive those scores internationally” needs to be tempered as a credo because it lately amounts to only a tenth or two of difference for most gymnasts rather than a dramatic break that blows up potential scoring and because it is far from consistent, even among the various chosen ones. It has been quite person-specific.

Two years is an admittedly small sample, but let’s keep an eye on it in 2013.

