Election predictions and lead time
November 12, 2011 § Leave a comment
In 2008, Nate Silver became Internet-famous for a model to predict that year’s presidential election. After Obama’s victory, the model was adjudged to have performed well, calling 49 out of 50 states correctly. The predictions that came under scrutiny were those made just before the election. Of course, anyone who put reasonable weight on the polling data as of November 3rd would have called at least 45 out of 50 states correctly, making Silver’s success impressive but hardly inexplicable. On the other hand, Silver was predicting an advantage for McCain as recently as three months before the election, and, again, anyone who looked at polls would have concluded that the race was at least close at that point.
The obvious point is that it’s much harder to make predictions with a lead time. It follows that if you’re making predictions, you should make the lead times of your input variables consistent and clear. Silver’s latest model has the clarity but not the consistency. It seems strange to me to model votes using growth in election-year and approval ratings from the preceding year. As predictions, they’re too late — election-year GDP isn’t determined until after the election. In terms of understanding causation: even if you approve of regression for this purpose, wouldn’t you rather use the most relevant data, like election-day approval ratings? Of course Silver has Times Magazine editors who want copy now, but it’s resulted in copy with some incoherence.