Zak Crawley’s median innings

In cricket, the main summary statistic is the mean. We see this in batting and bowling averages, economy rates, and strike rates. By looking at the median batting score of players, we can discover that while Zak Crawley’s Test average is fairly standard for a recent England men’s batter, his distribution of innings scores is unusual.

England’s opening woes

Zak Crawley is an English Test batter. He bats in the top three, and has most recently been an opener. He averages 28.6 so far from 38 innings, which is similar to other recent openers for England. (Cook is included for scale.)

Player Debut Innings Runs Average 50s 100s High score
AN Cook 2006-03-01 291 12472 45.35 57 33 294
MA Carberry 2010-03-12 12 345 28.75 1 0 60
H Hameed 2016-11-09 19 439 24.39 4 0 82
KK Jennings 2016-12-08 32 781 25.19 1 2 146
MD Stoneman 2017-08-17 20 526 27.68 5 0 60
RJ Burns 2018-11-06 59 1789 30.32 11 3 133
JL Denly 2019-01-31 28 827 29.54 6 0 94
DP Sibley 2019-11-21 39 1042 28.94 5 2 133
Z Crawley 2019-11-29 38 1087 28.61 5 2 267

Here and later on in this article, I am using career figures rather than figures in a specific batting position.

By batting average, all recent English openers are much of a muchness. We can also see that Sibley and Crawley have very similar records, although Crawley’s high score is much higher.

Let’s add in the median - which represents the score a player exceeds 50% of the time - and compare that to the mean (average) in absolute terms and as a ratio (average divided by median):

Player Runs Average Median Difference Ratio
AN Cook 12472 45.35 24.00 21.35 1.89
MA Carberry 345 28.75 32.50 -3.75 0.88
H Hameed 439 24.39 9.00 15.39 2.71
KK Jennings 781 25.19 15.50 9.69 1.63
MD Stoneman 526 27.68 21.50 6.18 1.29
RJ Burns 1789 30.32 18.00 12.32 1.68
JL Denly 827 29.54 25.50 4.04 1.16
DP Sibley 1042 28.94 16.00 12.94 1.81
Z Crawley 1087 28.61 10.50 18.11 2.72

I am not sure if the ratio is a statistically useful or sound method, but I added it because the magnitude of the median and the average are correlated: players who score more in general will have a higher median and a higher difference. As Matt Allinson discovered in his post about the the standard deviation for the top-scoring players, there is a similar issue with the standard deviation.

Back to the table. A high ratio is not necessarily bad (Cook’s is higher than Denly’s, and I know who I’d rather have), but between two players with similar averages, it indicates that one may be less consistent than the other. Zak Crawley (10.5) and Haseeb Hameed (9) make very low scores in half of their innings, although Hameed hasn’t played that many.

Zak Crawley is somewhat unusual

Is Zak Crawley’s very low median, somewhat higher average, and corresponding high ratio unusual? He’s made just over a thousand Test runs, so let’s look at all batters to have done that and averaged over 20, then see which players have the highest ratios:

Player Team Debut Total Average Median Difference Ratio
MJ North Australia 2009-02-26 1171 35.48 10.00 25.48 3.55
RT Robinson England 1984-11-28 1601 36.39 12.00 24.39 3.03
DL Amiss England 1966-08-18 3612 46.31 16.00 30.31 2.89
VG Kambli India 1993-01-29 1084 54.20 19.00 35.20 2.85
S Madan Lal India 1974-06-06 1042 22.65 8.00 14.65 2.83
Z Crawley England 2019-11-29 1087 28.61 10.50 18.11 2.72
MG Vandort Sri Lanka 2001-09-06 1144 36.90 14.00 22.90 2.64
JG Bracewell New Zealand 1980-11-28 1001 20.43 8.00 12.43 2.55
JA Burns Australia 2014-12-26 1442 36.97 14.50 22.47 2.55
R Edwards Australia 1972-06-22 1171 40.38 16.00 24.38 2.52
DD Ebrahim Zimbabwe 2001-04-19 1225 22.69 9.00 13.69 2.52
Ijaz Ahmed Pakistan 1987-02-03 3315 37.67 15.00 22.67 2.51
JE Emburey England 1978-08-24 1713 22.54 9.00 13.54 2.50
KD Karthik India 2004-11-03 1025 25.00 10.00 15.00 2.50

A lot of these players are openers, and quite a few are from the recent era. This list cuts off at a ratio of 2.5, so these are the 14 men in Test history meeting these criteria. It seems safe to say that Zak Crawley isn’t exactly normal, but that’s not necessarily bad.

To compare him to Sibley again, we can make a histogram showing the frequency of their innings scores:

This shows that Crawley gets out under 20 slightly more often than Sibley, while Sibley gets out comparatively more often between 20 and 40. That is, even when looking against a comparable player in terms of overall figures, Crawley does get out early in his innings quite often.

Another popular introductory statistics visualisation is the box plot. The box plot here shows, for each of the England openers mentioned above:

  1. The median as a horizontal line.
  2. A box for the interquartile range. The middle 50% of the player’s innnings are inside this box.
  3. A vertical line for the ’typical’ range of their scores.
  4. Dots for innings scores outside that range.

This shows more clearly that Crawley’s distribution is most similar to Hameed’s, but his high scores are much higher. We can also see that Joe Denly was a very consistent starter, but could not push that into larger scores, and that he was never out for a duck.

This could be an early-career artifact

One other thing that is notable about the above list is that Amiss and Ahmed aside, every other player is between 1,000 and 2,000 runs. There are two main possible explanations here, that possibly feed into each other.

  1. Players with this kind of variability don’t last long.
  2. This is a statistical artifact due to Crawley being (presumably) early in his career.

(Of course, 2 could have been true for other players who then suffered from being dropped due to 1.)

Let’s take a look at the first 38 innings for all players and sort by the median, rather than the ratio, looking for something under 15:

Player Team Debut Total Average Median Difference Ratio
MJ North Australia 2009-02-26 1149 35.91 10.00 25.91 3.59
Z Crawley England 2019-11-29 1087 28.61 10.50 18.11 2.72
BRM Taylor Zimbabwe 2004-05-06 1260 35.00 11.50 23.50 3.04
MJK Smith England 1958-06-05 1090 32.06 12.00 20.06 2.67
RS Madugalle Sri Lanka 1982-02-17 1006 30.48 12.00 18.48 2.54
RT Robinson England 1984-11-28 1358 38.80 12.00 26.80 3.23
W Jaffer India 2000-02-24 1334 36.05 12.50 23.55 2.88
Yuvraj Singh India 2003-10-16 1018 33.93 12.50 21.43 2.71
CH Gayle West Indies 2000-03-16 1025 29.29 13.00 16.29 2.25
S Wettimuny Sri Lanka 1982-02-17 1098 30.50 13.00 17.50 2.35
MA Atherton England 1989-08-10 1307 35.32 13.50 21.82 2.62
Babar Azam Pakistan 2016-10-13 1093 34.16 14.00 20.16 2.44
JW Burke Australia 1951-02-02 1132 35.38 14.00 21.38 2.53
MAK Pataudi India 1961-12-13 1329 40.27 14.00 26.27 2.88
MG Vandort Sri Lanka 2001-09-06 1144 36.90 14.00 22.90 2.64
MH Mankad India 1946-06-22 1067 29.64 14.00 15.64 2.12
MS Atapattu Sri Lanka 1990-11-23 1011 28.89 14.00 14.89 2.06
KOA Powell West Indies 2011-07-06 1044 28.22 14.00 14.22 2.02
NS Sidhu India 1983-11-12 1050 30.88 14.50 16.38 2.13
T Taibu Zimbabwe 2001-07-19 1026 29.31 15.00 14.31 1.95
TE Bailey England 1949-06-11 1022 40.88 15.00 25.88 2.73
HM Nicholls New Zealand 2016-02-12 1188 38.32 15.00 23.32 2.55
Imtiaz Ahmed Pakistan 1952-10-16 1103 31.51 15.00 16.51 2.10
Zaheer Abbas Pakistan 1969-10-24 1295 35.97 15.00 20.97 2.40

There are some truly excellent players there, who went on to be greats for their country. Marvan Atapattu famously scored one run in his first six innings, and ended up with six double hundreds. Here’s a histogram of his innings scores, compared to Herschelle Gibbs, who averaged 41.95 from a similar number of innings, but with a median of 25:

Back to Michael Carberry

Arguably more interesting than Zak Crawley is the case of Michael Carberry. In the second table above we saw that his median score was higher than his average. Unlike Crawley’s low-but-not-unprecedented median score for a player of that average, it is incredibly rare for a player to average less than their median score.

Intuitively this makes sense: players are more likely to get out on lower scores. For a player to have a higher median than average, they would have to have very few not out innings, and a habit of making starts but not converting into large scores. The second of these is unsustainable in the long term; it’s very rare for a player not to be out early a decent proportion of the time. Jack Hobbs ended on single figures 15 times in 105 innings, which is as low as that ratio gets for genuine Test batters.

In total, only 11 men have a median score higher than their average while making over 200 Test runs, and at least one of those (Shreyas Iyer) is still a current player. They are:

Player Team Debut Total Average Median Difference Ratio
DS Steele England 1975-07-31 673 42.06 43.00 -0.94 0.98
HG Vivian New Zealand 1931-07-29 421 42.10 50.50 -8.40 0.83
SS Iyer India 2021-11-25 388 55.43 65.00 -9.57 0.85
MA Carberry England 2010-03-12 345 28.75 32.50 -3.75 0.88
HF Wade South Africa 1935-06-15 327 20.44 20.50 -0.06 1.00
TAM Siriwardana Sri Lanka 2015-10-14 298 33.11 35.00 -1.89 0.95
AJ Finch Australia 2018-10-07 278 27.80 28.00 -0.20 0.99
MJ Susskind South Africa 1924-06-14 268 33.50 37.00 -3.50 0.91
Dilawar Hussain India 1934-01-05 254 42.33 45.00 -2.67 0.94
JAH Marshall New Zealand 2005-03-26 218 19.82 24.00 -4.18 0.83
RH Vance New Zealand 1988-03-03 207 29.57 31.00 -1.43 0.95

Median isn’t magic

Why doesn’t cricket use the median score or other summary statistics? I don’t know for sure, but I suspect it’s a combination of:

  1. The mean handles not outs very elegantly, as we take runs scored divided by dismissals. Runs per innings is still a more niche measure. There is no obvious way to handle not outs with the median score, unless we consider median score between dismissals or similar.
  2. The pay-off isn’t worth the effort. The median score, or more generally the N-th quantile, or the standard deviation of a player’s scores, don’t tell us that much more than the current summary statistics, and would require a lot of explaining when they were added.

I do think there is something to be send for some of the basic visualisations here, showing the distribution of a player’s scores: we know a lot of this intuitively or via anecdote, but having a clear visual representation is useful.

The future of cricket statistics probably lies in much more advanced methods, based off ball-by-ball data rather than innings data as used here. Kartikeya Date, for instance, has done some wonderful work with ball-tracking data and control metrics.