Learning Geospatial Analysis with Python
上QQ阅读APP看书,第一时间看更新

Geospatial analysis and our world

The morning of November 7, 2012, saw political experts in the United States scrambling to explain how incumbent Democratic President, Barack Obama, had pulled off such a decisive election victory. They scrambled because none of them had seen the win coming—at least not the 332 electoral college votes for Obama, to Republican candidate Mitt Romney's anemic 206. The major political polling organizations had also unanimously declared the race would be a photo finish in the weeks leading up to the election.

Political experts offered broad explanations including "a better ground campaign" by Obama, "demographic shifts" that favored the Democrats, and even accusations of a weakened Republican Party brand. But these generalized theories fell far short of explaining the results in any satisfying detail. The following map shows the electoral votes received by each candidate:

The explanation for the political upset came instead from a 34 year old blogger from Michigan, named Nate Silver. Armed with only a laptop, he had predicted the exact outcome long before the election day, and he had done so with startling precision.

Both election campaigns calculated multiple winning scenarios which followed a path of winning certain key battleground states. The battleground states are also known as swing states, because neither candidate had overwhelming support from that state going into the election. These states included Colorado, Florida, Iowa, Nevada, New Hampshire, North Carolina, Ohio, Virginia, and Wisconsin. But Silver had called these states accurately as if they had been known all along.

Silver's method for predicting the future can be summed up as geostatistical profiling. He used geographic analysis to fill in gaps in polling data that caused other analysts to have inaccurate predictions. Large polling organizations poll states on a rolling but irregular basis leading up to elections. Furthermore, different organizations use different polling approaches. Silver first weighted these pollsters based on their historical accuracy and calculated an error rate.

He could then average polls together and account for potential error. His second innovation was to profile states based on historical voting trends and demographics. He could then classify similar states and even voting districts. Anywhere he was missing polling data from a particular state, he could find surrogate data from a similar state and extrapolate to complete his data set. The combination of careful weighting and extrapolation allowed Silver to run a more robust national voting model which paid off. Interestingly, Silver's political models use many of the same elements of probability theory used in his PECOTA software he had developed earlier for baseball but with a geospatial twist. The following plot shows an accuracy comparison of researchers and political experts. The analysts using geospatial techniques led the pack by a wide margin.

It would be one thing if Nate Silver had been the only one to come up with such an accurate prediction. But he was just the most visible due to his high-profile blog on the New York Times, and his articulate and detailed posts about his methods. He recognized many other analysts including Sam Wang of the Princeton Election Consortium and David Linzer of Emory University, who used similar geostatistical methods and achieved highly accurate results. Silver was on the crest of a wave of geospatial analysts who were bringing the field to the forefront of national attention through detailed, objective, and corrective spatial and statistical modeling.

Tip

An economist and statistician named Skipper Seabold attempted to reverse engineer the FiveThirtyEight model using Python. His efforts can be found at the following URL:

https://github.com/jseabold/538model

Beyond politics

The application of geospatial modeling to politics is one of the most recent and visible case studies. However, the use of geospatial analysis has been increasing steadily over the last 15 years. In 2004, the US Department of Labor declared the geospatial industry one of 13 high-growth industries in the United States expected to create millions of jobs in the coming decades.

Geospatial analysis can be found in almost every industry including real estate, oil and gas, agriculture, defense, disaster management, health, transportation, and oceanography to name a few. For a good overview of how geospatial analysis is used in dozens of different industries visit: http://www.esri.com/what-is-gis/who-uses-gis.