October 17, 2011 § Leave a comment
For some reason rough statistics don’t bother me as much as rough probabilities.
Exhibit A: Surveys of Occupy Wall Street protesters. These are obviously subject to all sorts of sampling biases, and are unlikely to be representative of the movement as a whole. Yet not only do they fail to irk me, I’m happy they’re being collected.
Exhibit B: Nate Silver multiplies some dependent numbers:
The following is not mathematically rigorous, since the events of yesterday evening were contingent upon one another in various ways. But just for fun, let’s put all of them together in sequence:
The Red Sox had just a 0.3 percent chance of failing to make the playoffs on Sept. 3.
The Rays had just a 0.3 percent chance of coming back after trailing 7-0 with two innings to play.
The Red Sox had only about a 2 percent chance of losing their game against Baltimore, when the Orioles were down to their last strike.
The Rays had about a 2 percent chance of winning in the bottom of the 9th, with Johnson also down to his last strike.
Multiply those four probabilities together, and you get a combined probability of about one chance in 278 million of all these events coming together in quite this way.
I care about baseball less than I care about Occupy Wall Street, but this bugs me a lot. It’s disclaimered, but that doesn’t stop the number from being quoted as gospel.
I think this has something to do with “statistics” being closer to raw numbers, whereas probabilities require some level of abstraction. You can do statistics without a model in any meaningful sense, but probabilities require a reference class to at least be implied. If you’re not clear on the reference class, you might end up doing dumb things like multiplying things that have no business being multiplied. Obviously I use rough probabilities all the time, but try to make the model explicit if it’s for public consumption, and try to have it very clear in my head in any case. Otherwise, stick with frequencies.