Home        Geog 486

Geography 486 - Lesson 5

Representing Volumes and Surfaces - Week 1

Kriging

Capstone Project Screen Shot

 

Kriging

Kriging is a statistical technique designed to predict a dependent variable based upon a number of independent variables much like a regression equation. Kriging is commonly used in geographic information systems to estimate an attribute of interest (e.g. height or crop yield) based upon the spatial structure of the other known attributes of interest.[1]

Unlike Inverse Distance Weighting that is another means of estimating an attribute of interest and is limited by the parameters of the estimating data set, Kriging attempts to estimate the unknown attribute of interest by minimizing the statistical error among the known set of attributes. The term for this kind of statistic is Stochastic from the Greek stokhastikos, diviner; from stokhazesthai, to guess at; from stokhos, aim or goal.[2] Essentially, Kriging is a statistical guess at what an attribute of interest will be based upon the spatial characteristics of other known instances of that attribute of interest. Whereas Inverse Distant Weighting will arrive at a value somewhere between the minimum and maximum values of the data set under study for the attribute in question in a deterministic model, Kriging can actually predict a value outside the parameters of the data set.

The theory of Kriging was originally developed by Danie G. Krige, hence the name; and then further developed by the French mathematician Georges Matheron in the 1960’s.[3] Kriging is also known as a Gaussian process regression because it is assumed that the differences in variation between any two points are normally distributed.[4]

As this technique is used in Geographic Information Systems (GIS), Kriging assumes that the attribute of interest can be treated as a regionalized variable.[5] A regionalized variable is considered to be somewhere between a truly random variable and a completely deterministic variable in that the variable will change in a continuous manner from one location to another such that the points near each other are spatially correlated whereas those that are widely separated are statistically independent.[6]

This relationship can be seen in the graphic below, called a semivariogram, from our Geography 486, Lesson 5 documentation.[7] The predicted values of the Kriging equation fall along the black line. The amount of error in the prediction (the Nugget) is calculated as the value at the point where the predicted values cross the Y axis (semivariance). Semivariance is a measure of the spatial dependence between sample points with the magnitude of the semivariance dependent upon the distance between the points.[8] A small distance between points will generally result in a small semivariance and a larger distance between points will generally result in a larger semivariance.[9]

 



 

The predicted values will fall along the line representing the equation solution until at some distance the semivariance reaches a limit.  This is where the line flattens out horizontally and is called a 'sill'. From the point of interest to the distance where the equation flattens out is termed the 'range'. This is the range at which the regionalized variable continues to be related to the surrounding known sample points.[10] The implication of this concept is that at some distance and beyond adding more variables adds virtually nothing to the predictive value of the equation.

This predictive model conforms to Toblers first law of geography (that all places are related but nearby places are more related than distant places[10]) except Kriging assumes that at some point more distant places will have virtually no statistical relationship to the attributes of the spatial point in question.

So, in conclusion, Kriging is a predictive statistical model (as opposed to a deterministic model) that assumes spatial data closer to the point of interest is more important than the spatial data farther away from the point of interest, and that at some statistically determined distance spatial data will essentially have no predictive value.


[1] https://www.e-eduction.psu.edu/courses/geog486/L05_compiled.html, Part VII: Kriging, accessed on February 7,2006.
[2] http://dictionary.reference.com/search?q=Stochastic, accessed February 11, 2006.
[3] http://en.wikipedia.org/wiki/Kriging, accessed February 11, 2006.
[4] Ibid.
[5] http://www.ems-i.com/gmshelp/Interpolation/Interpolation_Schemes/Kriging/Kriging.htm, accessed February 11, 2006.
[6] Ibid.
[7] https://cms.psu.edu/section/default.asp?WCU=CRSCNT&id=200506SPWD+++IGEOG+486+001, accessed February 12, 2006.
[8] http://ewr.cee.vt.edu/environmental/teach/smprimer/kriging.html#Semivariograms, accessed February 12, 2006
[9] Ibid.
[10] Ibid.
[11] https://www.e-eduction.psu.edu/courses/geog486/L05_compiled.html, Part V: Interpolation, accessed on February 7, 2006.
 

 

 

 

 

Capstone Project Initial Screen Shot - York County Current Numbers of Voter Registrations by Voting District

 

From the ESRI Web Site, I downloaded the Census TIGER Line Files for York County, PA.   With the Voting District shapefile as a base, I changed the projection of the shapefile using the Albers USA Contiguous Equidistant Conic projection.  I did this to preserve the relative angles and distances between features and because I would be using this information in a series of chloropleth maps investigating voter turnout over a number of primary and general elections.

 

To this map base, I added the major Interstate and State roads in the county, more in order to orient the reader than to provide route guidance or mapping.

 

Finally, I was able to obtain a comma-delimited text file of current voter registrations and voter turnout for each primary and general election back to 1992 from the IT Department in the York County government offices.  I loaded this information into an Access database and performed an initial query to extract voter registration totals by voting district.

 

Unfortunately, the codes designating the voting districts was different for each dataset so I could not immediately join this information to the data table for the ESRI voting district shapefile.  Instead, using the roads layer, I geocoded the polling places from the County registration data and then built a cross reference file for using the geocoding results as a base that told me what polling place for what county voting district code fell into what polygon from the ESRI voting district shapefile.

 

Once the cross reference was built, I could then sum the registrations by voting district in the voter database and then join that information to the voting district polygons on the map.  The result is the chloropleth map showing the density of registered voters by voting district.

 

Click on the icon below to see the full screen shot of the results achieved to date:

 

 

The next steps are to extract the votes by party by voting district for the general elections between 1992 and 2004 to create a series of chloropleth maps that should show the relative turnout numbers.  Also, the same approach can be taken to look at primary data.  This should yield significantly different turnout rates unless there is a significant party issue or race at stake.