In cricket, the main summary statistic is the mean. We see this in batting and bowling averages, economy rates, and strike rates. By looking at the median batting score of players, we can discover that while Zak Crawley’s Test average is fairly standard for a recent England men’s batter, his distribution of innings scores is unusual.
England’s opening woes
Zak Crawley is an English Test batter. He bats in the top three, and has most recently been an opener. He averages 28.6 so far from 38 innings, which is similar to other recent openers for England. (Cook is included for scale.)
Here and later on in this article, I am using career figures rather than figures in a specific batting position.
By batting average, all recent English openers are much of a muchness. We can also see that Sibley and Crawley have very similar records, although Crawley’s high score is much higher.
Let’s add in the median - which represents the score a player exceeds 50% of the time - and compare that to the mean (average) in absolute terms and as a ratio (average divided by median):
I am not sure if the ratio is a statistically useful or sound method, but I added it because the magnitude of the median and the average are correlated: players who score more in general will have a higher median and a higher difference. As Matt Allinson discovered in his post about the the standard deviation for the top-scoring players, there is a similar issue with the standard deviation.
Back to the table. A high ratio is not necessarily bad (Cook’s is higher than Denly’s, and I know who I’d rather have), but between two players with similar averages, it indicates that one may be less consistent than the other. Zak Crawley (10.5) and Haseeb Hameed (9) make very low scores in half of their innings, although Hameed hasn’t played that many.
Zak Crawley is somewhat unusual
Is Zak Crawley’s very low median, somewhat higher average, and corresponding high ratio unusual? He’s made just over a thousand Test runs, so let’s look at all batters to have done that and averaged over 20, then see which players have the highest ratios:
|S Madan Lal||India||1974-06-06||1042||22.65||8.00||14.65||2.83|
|MG Vandort||Sri Lanka||2001-09-06||1144||36.90||14.00||22.90||2.64|
|JG Bracewell||New Zealand||1980-11-28||1001||20.43||8.00||12.43||2.55|
A lot of these players are openers, and quite a few are from the recent era. This list cuts off at a ratio of 2.5, so these are the 14 men in Test history meeting these criteria. It seems safe to say that Zak Crawley isn’t exactly normal, but that’s not necessarily bad.
To compare him to Sibley again, we can make a histogram showing the frequency of their innings scores:
This shows that Crawley gets out under 20 slightly more often than Sibley, while Sibley gets out comparatively more often between 20 and 40. That is, even when looking against a comparable player in terms of overall figures, Crawley does get out early in his innings quite often.
Another popular introductory statistics visualisation is the box plot. The box plot here shows, for each of the England openers mentioned above:
- The median as a horizontal line.
- A box for the interquartile range. The middle 50% of the player’s innnings are inside this box.
- A vertical line for the ’typical’ range of their scores.
- Dots for innings scores outside that range.
This shows more clearly that Crawley’s distribution is most similar to Hameed’s, but his high scores are much higher. We can also see that Joe Denly was a very consistent starter, but could not push that into larger scores, and that he was never out for a duck.
This could be an early-career artifact
One other thing that is notable about the above list is that Amiss and Ahmed aside, every other player is between 1,000 and 2,000 runs. There are two main possible explanations here, that possibly feed into each other.
- Players with this kind of variability don’t last long.
- This is a statistical artifact due to Crawley being (presumably) early in his career.
(Of course, 2 could have been true for other players who then suffered from being dropped due to 1.)
Let’s take a look at the first 38 innings for all players and sort by the median, rather than the ratio, looking for something under 15:
|RS Madugalle||Sri Lanka||1982-02-17||1006||30.48||12.00||18.48||2.54|
|CH Gayle||West Indies||2000-03-16||1025||29.29||13.00||16.29||2.25|
|S Wettimuny||Sri Lanka||1982-02-17||1098||30.50||13.00||17.50||2.35|
|MG Vandort||Sri Lanka||2001-09-06||1144||36.90||14.00||22.90||2.64|
|MS Atapattu||Sri Lanka||1990-11-23||1011||28.89||14.00||14.89||2.06|
|KOA Powell||West Indies||2011-07-06||1044||28.22||14.00||14.22||2.02|
|HM Nicholls||New Zealand||2016-02-12||1188||38.32||15.00||23.32||2.55|
There are some truly excellent players there, who went on to be greats for their country. Marvan Atapattu famously scored one run in his first six innings, and ended up with six double hundreds. Here’s a histogram of his innings scores, compared to Herschelle Gibbs, who averaged 41.95 from a similar number of innings, but with a median of 25:
Back to Michael Carberry
Arguably more interesting than Zak Crawley is the case of Michael Carberry. In the second table above we saw that his median score was higher than his average. Unlike Crawley’s low-but-not-unprecedented median score for a player of that average, it is incredibly rare for a player to average less than their median score.
Intuitively this makes sense: players are more likely to get out on lower scores. For a player to have a higher median than average, they would have to have very few not out innings, and a habit of making starts but not converting into large scores. The second of these is unsustainable in the long term; it’s very rare for a player not to be out early a decent proportion of the time. Jack Hobbs ended on single figures 15 times in 105 innings, which is as low as that ratio gets for genuine Test batters.
In total, only 11 men have a median score higher than their average while making over 200 Test runs, and at least one of those (Shreyas Iyer) is still a current player. They are:
|HG Vivian||New Zealand||1931-07-29||421||42.10||50.50||-8.40||0.83|
|HF Wade||South Africa||1935-06-15||327||20.44||20.50||-0.06||1.00|
|TAM Siriwardana||Sri Lanka||2015-10-14||298||33.11||35.00||-1.89||0.95|
|MJ Susskind||South Africa||1924-06-14||268||33.50||37.00||-3.50||0.91|
|JAH Marshall||New Zealand||2005-03-26||218||19.82||24.00||-4.18||0.83|
|RH Vance||New Zealand||1988-03-03||207||29.57||31.00||-1.43||0.95|
Median isn’t magic
Why doesn’t cricket use the median score or other summary statistics? I don’t know for sure, but I suspect it’s a combination of:
- The mean handles not outs very elegantly, as we take runs scored divided by dismissals. Runs per innings is still a more niche measure. There is no obvious way to handle not outs with the median score, unless we consider median score between dismissals or similar.
- The pay-off isn’t worth the effort. The median score, or more generally the N-th quantile, or the standard deviation of a player’s scores, don’t tell us that much more than the current summary statistics, and would require a lot of explaining when they were added.
I do think there is something to be send for some of the basic visualisations here, showing the distribution of a player’s scores: we know a lot of this intuitively or via anecdote, but having a clear visual representation is useful.
The future of cricket statistics probably lies in much more advanced methods, based off ball-by-ball data rather than innings data as used here. Kartikeya Date, for instance, has done some wonderful work with ball-tracking data and control metrics.