Smell test failures: America’s greenest cities

April 26, 2011 § Leave a comment

As Nate Silver Tweeted, if you’re declaring Las Vegas the second-greenest city in the U.S., maybe your formula is terrible?

The five variables considered were:

  • “percent of people in each city that put their green beliefs into action”
  • “percentage of people who are willing to admit to no concern or consciousness of environmental issues”
  • “percent of people who make a conscious effort to recycle”
  • “Average trips taken on public transport each weekday”
  • “percent of homes that use solar energy for heating”

These variables were said to be “weighted equally”, and that should set off alarm bells: how do you equally weight variables with different units? You could use ranks, but I bet that isn’t what they did.

Of the top 25 cities, Vegas was 21st in consciousness, 17th= best (i.e. lowest) in unconsciousness, last in recycling, and had no public transport data. But they were first in solar power! Meanwhile, San Francisco was first in consciousness, 4th= best in unconciousness, first in recycling, 2nd in public transit per capita, and 11th in solar. So how did Vegas beat SF overall?

Well, Vegas has a lot of solar power: 0.43% of households, when second-placed Albuquerque was 0.2%, and the median for the top 25 is 0.06. Whatever standardisation they did — perhaps changing to z-scores — left Vegas a huge outlier for solar. So though Vegas sucked in all the other categories, solar alone pushed them to second overall.

Lessons:

  • Changing to z-scores might not be sensible if your data aren’t normal.
  • Ranks can be good.

If you’re tring to get people to take statistics seriously, this might not be the best sentence

April 18, 2011 § Leave a comment

And while 80 percent of league teams still haven’t wised up to the coming wave of data analysis, SportVU can boast three of this season’s six division winners as clients, as well as upstart squads like the Warriors.

A good reason to overfit

April 9, 2011 § Leave a comment

For comedic purposes:

B = 2n+12(Z-R)+2c+2S+3

B = Approximate onscreen body count
n = The number of the installment in the series
Z = Zombie factor (i.e., is the film directed by rock-‘n-schlock auteur Rob Zombie? 1 for yes, 0 for no)
R = Is the film part of a reboot? (1 for yes, 0 for no)
c = The number of colons in the title
S = Does the film take place in outer space? (1 for yes, 0 for no)

Where Am I?

You are currently viewing the archives for April, 2011 at "But it's under .05!".