Statistics can be tricky things. It is always good to remember the remark attributed to Mark Twain
There are three kinds of lies: Lies, Damned Lies, and Statistics
Nonetheless, from time to time on this blog, I engage in them.
For example, if I said that The Database has 24,524 Jerome Gambit games that start 1.e4 e5 2.Nf3 Nc6 3.Bc4 Bc5 4.Bxf7+, and that White scored 50%, what would that actually mean?
Twenty four and a half thousand games is a lot of games, but do they give a fair picture of the prospects for the Jerome Gambit?
Because players tend to submit their wins to newspapers, magazines and websites - while largely ignoring their losses - a collection of published games can be skewed when it comes to outcomes. No doubt The Database, which includes games from historical research and which receives games from readers of this blog, shows some of this bias.
Proponents of a particular line of play can also become successful with it, so that even a complete sample of their games can skew supportive. Likewise, very strong players (recall the recent post "Jerome Gambit: Top Crunch Numbers") can affect the outcome.
However, several factors help stabilize The Database. When I collect games from the lichess.org website, for example, I gather wins, losses, and draws. Likewise, with other sites like Chess.com, where I have searched for Jerome Gambit games.
More importantly, at least half of The Database is drawn from the FICS chess website database (from its inception through April 2022), and the games I gathered are, again, all the wins, losses and draws available.
So, The Database remains largely representative of online play at the club chess level. This allows me to give assessments of a line (or move) according to both computer (e.g. Stockfish 15) evaluation and results in related chess play.
No comments:
Post a Comment