Cricket query help

Back to cricket query

Basics

This is an interface to an SQLite database. The database contains a scrape of Cricinfo's Statsguru batting, bowling and team tabs. It allows running (read-only) SQL queries directly, and has some saved queries:

The scraped data comes from Owen Brasier's cricketstats project. The source for this project is also public.

Other sections on this page are functions, schema, annoyances, result formatting, and latest data.

Functions

In addition to the usual set of SQLite functions (see the links in SQLite's SQL documentation for more information), there is one custom function available: median.

Median

SQLite does not come with a median function, so median is provided as an application-defined function. It is an aggregate function on numeric values (similar to sum). The value will be displayed as described in result formatting, but internally is always a float.

Schema

This is a pretty direct interface to an SQLite database. The database contains a scrape of Cricinfo's Statsguru batting, bowling and team tabs in the following tables, where $gender can be men or women and $format can be test, odi, or t20i.

Table aliases

As the table names are very long, and for many purposes we will want to separate the results by gender and format anyway (comparing a T20I average to a Test average is not often useful), these tables have shorter names available when choosing genders and formats with the checkboxes:

This will allow a separate results table for each gender and format chosen. If we use the full table names (for instance, women_test_bowling_innings) directly, then the results shown for all genders and formats will be the same.

Batting tables

Column name Type Comment
i integer Internal ID, not particularly useful
player text
team text
runs integer
runs_txt text Score, suffixed with * if not out
not_out boolean See boolean columns
mins numeric
bf numeric
fours numeric
sixes numeric
sr numeric
pos integer
innings integer
opposition text
ground text
start_date date See date columns
player_id text See ID columns
match_id text See ID columns

Bowling tables

Column name Type Comment
i integer Internal ID, not particularly useful
player text
team text
overs text May contain a dot; for instance, 4.1
maidens numeric
runs numeric
wickets numeric
bpo integer
balls numeric
economy numeric
pos integer
innings integer
opposition text
ground text
start_date date See date columns
player_id text See ID columns
match_id text See ID columns

Team tables

Column name Type Comment
i integer Internal ID, not particularly useful
team text
score text
runs integer
overs numeric
bpo integer
rpo numeric
lead integer
all_out boolean See boolean columns
declared boolean See boolean columns
result text
innings integer
opposition text
ground text
start_date date See date columns
match_id text See ID columns

Annoyances

There are several limitations in this database, largely due to a loss of fidelity when scraping and the choice to use a loosely-typed database. These are the most annoying of those.

Boolean columns

Boolean columns (like innings.not_out and team_innings.all_out) are not real booleans, as SQLite doesn't support them. Instead, they contain the strings 'True' and 'False', so compare directly against those values.

Date columns

Similarly, the start_date columns are not real dates. For prettier formatting, they are displayed in a format like this: 2 June 2021.

However, internally they are represented as ISO date strings: 2021-06-02. To find matches on that date, we can use start_date = '2021-06-02'.

Results limit

Hopefully this is less annoying, but I have chosen to limit the number of rows in each table to 100, no matter how many rows were returned. If this becomes a problem I can increase it.

Result formatting

ID columns

Player and match ID columns have special behaviour. A player ID is p followed by digits (with no other characters in the string), and a match ID is the same but with an m prefix.

When a player or match ID is detected (either from the original value, or constructed if you really want to), it will be linked to the relevant Cricinfo player profile or scorecard page.

Other result formatting

In addition to the automatic formatting for date columns and ID columns, numeric columns will be formatted with a thousands separator, and if they contain a non-integer, will be shown to two decimal places.

To force a result column to be displayed literally, prepend an apostrophe - ' - this will be stripped from the output and the rest will be presented verbatim. For example: strftime('%Y', start_date) will display as 2,001, while "'" || strftime('%Y', start_date) will display as 2001.

Latest data

The database is updated daily, although there will be a delay when new teams play their first international match. The latest match for each format is:

gender format team opposition ground start_date match_id
men test India Australia Brisbane 14 December 2024 m1426557
women test South Africa England Bloemfontein 15 December 2024 m1432231
men odi Afghanistan Zimbabwe Harare 21 December 2024 m1457925
women odi New Zealand Australia Wellington 21 December 2024 m1443566
men t20i U.S.A. Nepal Dallas 17 October 2024 m1453518
women t20i Mexico Costa Rica Naucalpan 17 November 2024 m1459720

By Sean McGivern