Towards a course on causal inference: What is causation, and what can statistics tell us about it?

May 17, 2011 § Leave a comment

Pearl, “The Art and Science of Cause and Effect”: Hume’s two riddles: How do people ever acquire knowledge of causation? What difference does it make if I told you that a certain connection is or is not causal? Making them concrete: How should a robot acquire causal information through interaction with its environment? How should a robot process causal information received from its creator-programmer? Solutions: Treat causation as a summary of behaviour under interventions. Use equations and graphs as a mathematical language within which causal thoughts can be represented and manipulated. Treat interventions as a surgery over equations. Examples: Can we find the effect of smoking on cancer assuming that an intermediate measurement of tar deposits is available? Yes, but this doesn’t solve the smoking-cancer debate. Simpson’s paradox: Should we, in salary discrimination cases, compare salaries of equally qualified men and women or instead compare qualifications of equally paid men and women? Draw a graph, test if a set of candidate measurements is sufficient.

Freedman, “Statistical Models and Shoe Leather”: Regression can work for causal inference, but usually doesn’t. Test the model against empirical reality. Does the model predict new phenomena? Does it predict the results of interventions? Are the predictions right? Snow vs Kanarek: Snow dealt with the ecological fallacy, found a natural experiment, and collected the data he needed. Gibson: Was political intolerance during the McCarthy era driven by mass opinion or elite opinion? Clear question, good summary data, weak causal model, conclusion does not follow from data. Modelling problems: Sharpness of research question. Quantifying concepts. Missing data. Choice of covariates. Functional form. Stochastic assumptions. Technical fixes: robust estimators, generalised least squares, specification tests. Can these deal with non-iid errors? State of the literature: Observational causal inference requires strong assumptions or unusual phenomena. Can causal regression ever be reliable? Implicit view that all questions have answers. Do they? Strength of a model: Assumptions are hard to prove. Complexity is not correctness. We should test against observable phenomena.

"But it's under .05!"