## Because I’m giving a talk on it: Birth order, baseball, and Simpson’s paradox reversed

February 7, 2011 § 2 Comments

Note: Concatenates and revises four previous posts. It’s over 4000 words, you have better things to do than read this. Even if you’re interested, the eventual paper will be more useful and more correct.

I’m going to start by posing three questions. One has to do with baseball and personality; the other two with statistics and causation. Most people, though not me, find baseball and personality more interesting, so let’s pose that question first.

In Major League Baseball, do younger brothers typically steal at a higher or lower rate than their older brothers, or are they the same on average?

« Read the rest of this entry »

## No regress: Digit ratios and risky decisions

Suppose you wanted to know the relationship between risk-taking and digit ratio (index finger length divided by ringer finger length). Suppose you had the pretty good idea of giving 152 Caucasian experimental subjects a choice of six lotteries, ordered by riskiness.

One approach would be to give every one the same choice of lotteries. You could then draw two dot or scatter plots of lottery choice against digit ratio–one for women, one for men. If you really wanted a P-value, you could test, for each gender, whether the relationship between lottery choice and digit ratio was significantly different from random assignment.

Another approach would be to use three different sets of lotteries (called 50-50, 75-25, and 25-75). Furthermore, devise three different “frames” (wordings) for the lotteries; for each subject, randomly assign one wording. Throw the result into a regression, with indicators for the set of lotteries and for the frame. You get something like this:

The problem with this second approach is that the regression model is wrong. This is because all models is wrong. In particular, these graphs:

make it very hard to believe that the difference between the lotteries is well-represented by a constant; the graphs aren’t even the same shape. (No, an ordered probit doesn’t solve this problem.) There isn’t any good reason to think frame and gender effects are constant, either.

The difference between the first approach and the second is the difference between “hey, this is cool” and “hey, this might be cool but I don’t trust any of the numbers”. (Also, the first approach is easier.) As it stands, there may well be something real going on, but it’s hard to say more than that.

## Research finds: If you ask vague questions, students might not give the answer you intended

Ars Technica headline: “College upperclassmen still fail at scientific reasoning”

From the study: “Questions… were sometimes ambiguously worded, allowing us not only to diagnose whether students had a correct or incorrect understanding of a carbon-related process but also to uncover their ways of reasoning about carbon-related processes.”

Sample question: “Once carbon enters a plant, it can be converted into energy for plant growth. True or false?”