This is an interface to an SQLite database. The database contains a scrape of Cricinfo's Statsguru batting, bowling and team tabs. It allows running (read-only) SQL queries directly, and has some saved queries:
The scraped data comes from Owen Brasier's cricketstats project. The source for this project is also public.
Other sections on this page are functions, schema, annoyances, result formatting, and latest data.
In addition to the usual set of SQLite functions (see the links in
SQLite's
SQL documentation for
more information), there is one custom function
available: median
.
SQLite does not come with a median function, so median
is
provided as
an application-defined
function. It is an aggregate function on numeric values (similar
to sum
). The value will be displayed as described
in result formatting, but internally
is always a float.
This is a pretty direct interface to an SQLite database. The database contains
a scrape of Cricinfo's Statsguru batting, bowling and team tabs in the
following tables, where $gender
can be men
or women
and $format
can
be test
, odi
, or t20i
.
$gender_$format_batting_innings
$gender_$format_bowling_innings
$gender_$format_team_innings
As the table names are very long, and for many purposes we will want to separate the results by gender and format anyway (comparing a T20I average to a Test average is not often useful), these tables have shorter names available when choosing genders and formats with the checkboxes:
innings
- $gender_$format_batting_innings
bowling_innings
- $gender_$format_bowling_innings
team_innings
- $gender_$format_team_innings
This will allow a separate results table for each gender and format chosen. If
we use the full table names (for
instance, women_test_bowling_innings
) directly, then the results
shown for all genders and formats will be the same.
Column name | Type | Comment |
---|---|---|
i | integer | Internal ID, not particularly useful |
player | text | |
team | text | |
runs | integer | |
runs_txt | text | Score, suffixed with * if not out |
not_out | boolean | See boolean columns |
mins | numeric | |
bf | numeric | |
fours | numeric | |
sixes | numeric | |
sr | numeric | |
pos | integer | |
innings | integer | |
opposition | text | |
ground | text | |
start_date | date | See date columns |
player_id | text | See ID columns |
match_id | text | See ID columns |
Column name | Type | Comment |
---|---|---|
i | integer | Internal ID, not particularly useful |
player | text | |
team | text | |
overs | text | May contain a dot; for instance, 4.1 |
maidens | numeric | |
runs | numeric | |
wickets | numeric | |
bpo | integer | |
balls | numeric | |
economy | numeric | |
pos | integer | |
innings | integer | |
opposition | text | |
ground | text | |
start_date | date | See date columns |
player_id | text | See ID columns |
match_id | text | See ID columns |
Column name | Type | Comment |
---|---|---|
i | integer | Internal ID, not particularly useful |
team | text | |
score | text | |
runs | integer | |
overs | numeric | |
bpo | integer | |
rpo | numeric | |
lead | integer | |
all_out | boolean | See boolean columns |
declared | boolean | See boolean columns |
result | text | |
innings | integer | |
opposition | text | |
ground | text | |
start_date | date | See date columns |
match_id | text | See ID columns |
There are several limitations in this database, largely due to a loss of fidelity when scraping and the choice to use a loosely-typed database. These are the most annoying of those.
Boolean columns (like innings.not_out
and team_innings.all_out
) are not real booleans, as SQLite
doesn't support them. Instead, they contain the strings 'True'
and 'False'
, so compare directly against those values.
Similarly, the start_date
columns are not real dates. For
prettier formatting, they are displayed in a format like this: 2 June 2021.
However, internally they are represented as ISO date strings: 2021-06-02. To
find matches on that date, we can use start_date = '2021-06-02'
.
Hopefully this is less annoying, but I have chosen to limit the number of rows in each table to 100, no matter how many rows were returned. If this becomes a problem I can increase it.
Player and match ID columns have special behaviour. A player ID
is p
followed by digits (with no other characters in the string),
and a match ID is the same but with an m
prefix.
When a player or match ID is detected (either from the original value, or constructed if you really want to), it will be linked to the relevant Cricinfo player profile or scorecard page.
In addition to the automatic formatting for date columns and ID columns, numeric columns will be formatted with a thousands separator, and if they contain a non-integer, will be shown to two decimal places.
To force a result column to be displayed literally, prepend an apostrophe
- '
- this will be stripped from the output and the rest will be
presented verbatim. For example: strftime('%Y', start_date)
will
display as 2,001
, while "'" || strftime('%Y',
start_date)
will display as 2001
.
The database is updated daily, although there will be a delay when new teams play their first international match. The latest match for each format is:
gender | format | team | opposition | ground | start_date | match_id |
---|---|---|---|---|---|---|
men | test | India | New Zealand | Wankhede | 1 November 2024 | m1439898 |
women | test | India | South Africa | Chennai | 28 June 2024 | m1434290 |
men | odi | New Zealand | Sri Lanka | Pallekele | 19 November 2024 | m1456445 |
women | odi | India | New Zealand | Ahmedabad | 29 October 2024 | m1454393 |
men | t20i | U.S.A. | Nepal | Dallas | 17 October 2024 | m1453518 |
women | t20i | Mexico | Costa Rica | Naucalpan | 17 November 2024 | m1459720 |