Causal discovery is hard: Baseball wins, payroll, other stuff
May 18, 2011 § Leave a comment
(cross-posted to Phil’s comments)
Phil: “Suppose you’re a Martian who has just immigrated to North America, and you have no idea how baseball works. All you’ve got is a database full of statistics, and a black-box graph theory algorithm to try to figure out cause-and-effect relationships. What would you conclude?”
… while we found some evidence that winning affects payroll and payroll affests winning, the evidence suggests the effect of winning on payroll is the more direct, larger, and more lasting in magnitude one.
Suppose a second Martian came to North America to verify the results of the first one. Having some time on her hands, she takes the unprecedented step of actually watching a baseball game. When, in the first innings, Ichiro gets a hit, she thinks that it makes a lot more sense to say that ABs cause singles, rather than that singles cause ABs as the black box claimed. But as the innings continues, she sees that singles can cause ABs as well: by extending the innings. Somewhere around the bottom of the third, she realises that there is no way the black box can work out these sorts of causal relationships based on annual data: you would need gamelogs and a super-black-box.
Since even Martians have not yet developed super-black-boxes, she simplifies the problems. She throws out all the performance variables. Instead, for each team-year, she only considers two variables:
– team payroll (standardised in some way, perhaps fraction of total payroll);
– team wins.
She then builds two predictive models:
– year N wins as a function of year N payroll (if you wanted to be careful, you’d use preseason payroll)
– year (N+1) payroll as a function of year N wins and year N payroll.
These are not definitive models. It could be that good managers can talk their bosses into high payrolls, and, over and above that, are good at player selection. Furthermore, there are different ways to change payroll. Increasing your payroll by acquiring free agents will have a different effect from giving your existing players a pay rise. Finally, all the box score variables we dropped might matter after all: perhaps wins due to “luck” affect payroll differently from wins due to improved performance. The black box, however, either ignores these problems or solves them in obviously wrong ways. It’s better to directly model the relationships you care about.