Out-of-sample validation with 16 data points

November 21, 2011 § Leave a comment

It’s worth a try. Let me say that I think it’s a useful thing to do before I start nitpicking: the prediction interval widths are amusing. But:

  • If you only allow data prior to an election to be used when fitting a model, you ending using 1948 data to inform parameter estimates a lot, and 2008 data not at all. But if you’re trying to guess how well a model will do in 2012, isn’t this the opposite of what you want?
  • Preferring the model with the lowest prediction error isn’t necessarily the right thing to do: it rewards overfitting. All the models are designed after looking at past data. So even the test only allows past data in the estimation, it’s not entirely prospective because the variables have been chosen to give a good fit both in the past and in the future. You could fit a large set of high-order polynomials that give almost no prediction error according to this test, and its prediction for 2012 would be garbage. Parsimony still matters.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

What’s this?

You are currently reading Out-of-sample validation with 16 data points at "But it's under .05!".


%d bloggers like this: