Geography 486 - Lesson 8
Multiple Classifications and Multiple Representations - Week 2
Crime Data Maps Using Custom Color Schemes
Figure 1: Equal Interval Burglary Patterns with Personal Color Ramp
Figure 2: Burglary Patterns Based Upon Variance Around the Mean (Standard Deviations) with Personal Color Ramp
Crime Data Maps Using Different Symbolization Methods
Figure 3: Six Year Burglary Totals with Graduated Symbols
Figure 4: Six Year Burglary Totals with Proportional Symbols
Figure 5: Six Year Burglary Totals with Pictographs
Figure 6: Six Year Burglary Totals with Pictographs (Close-up)
Figure 7: Dot Map of Burglaries in 2003
Figure 8: Multivariate Dot Map for Annual Burglary Counts 1998-2003
Introduction
As was pointed out early in Lesson 7, classification is, at its heart, an exercise in categorization.1 We put things into categories in an attempt to distill information from that data, and in doing so we deliberately simplify the phenomena under study.2 This distillation process logically categorizes like features together, and then this simplified information is displayed within the spatial model of the real world created in our Geographic Information System (GIS) software application. How this data is distilled and categorized depends on the structure (or distribution) of the data, the categories that we use to aggregate and display that data, and the purpose or use of the resulting view of the data.
All this data collection and categorization would be for naught, if it didn’t conform to some mental model that we have of reality. The GIS application that we use is, in effect, a model of the geography, the physical region, under study. Because it is a model of reality, we are able to eliminate many of the details that might otherwise distract us from understanding or recognizing the distinguishing features or the phenomena that we have chosen to study.3
Through our efforts to categorize and display our data, we directly influence how others will perceive our results. To this end, we need to present information that both elucidates and informs the reader, or viewer, as to its significance. Incorrectly presented data can result in the transfer of incorrect perceptions, or worse, incorrect conclusions about the phenomena in question. At its best, the use of GIS software to present information in a spatial context can be the key to understanding what is physically happening in an area or region based upon the attributes of the phenomena under study.
The crime data maps used created as a part of this lesson (Lesson 8), illustrate these points.
Comparison of Philadelphia Burglaries for the Year 1998
The first two maps, Figures 1 and 2, look at the same crime information in two different ways. The phenomena under study are burglaries in the city of Philadelphia for the year 1998.
Figure 1 categorizes the census tracts for the city of Philadelphia by the percent of burglaries occurring in each tract as a percentage of the city’s 1998 burglary occurrences total. The data are divided into five categories, with each category an equal interval in size (roughly .29% each of the total number of burglaries for the year). The structure and distribution of this data is heavily skewed toward the lower intervals resulting in large areas of the city represented in green (the colors representing the lower intervals). The incidents of tracts being categorized in the higher intervals are shown in blue. Because the structure of the data is heavily skewed toward the lower intervals, there are relatively few areas of the city that are identified in the higher intervals, representing those tracts that have the higher percentages of total burglaries for the city.
Figure 2 uses this same data but presents it in a different way, by calculating a mean and standard deviation, the categories represent the tracts of the city that are above or below the mean and by how much. Through this data categorization method, large areas of the city are still shown to be below the average burglary rate, but a number of tracts are shown to be well above (>1.5 Standard Deviations) the mean. This analysis highlights a much larger number of areas where burglaries would appear to be problem, specifically those tracts in the central city, as identified by the purple color ramp. The brown areas indicate those tracts where the burglary rate is more than a half standard deviation below the mean, and the white tracts represent those areas within a half of a standard deviation (plus or minus) of the mean. By using a diverging cluster map, the user is better able to identify the tracts where burglaries are higher or lower.
In both of these examples, a custom color ramp was used. These two color ramps attempt to minimize the chance of misinterpretation by someone with color blindness.
Finally, the aggregation of data by census tract could lead to incorrect conclusions if, for example, a high burglary area happens to be split between two or more census tracts. This situation is known as the modifiable areal unit problem (MAUP).4 Different data aggregations can occur depending upon how the boundaries for the enumerated areas are drawn. If there are different ways to categorize an area, for example using police precinct boundaries, an analysis can use multiple maps using different area parameters to provide a more complete view of the phenomena in question.
The next set of maps, Figures 3 through 8, represents other ways to categorize and display data. In these cases, the data set represent the total number of burglaries for the years 1998-2003 displayed by census tract for the city of Philadelphia.
Comparison of Total Burglaries for a Six Year Period
The discussion of the next set of maps is based upon the total number of burglaries for the city of Philadelphia for the six year period 1998-2003.
Figure 3 uses graduated symbols, in this case circles, to represent the different categories of the data. The data are divided into five equal interval categories, not unlike Figure 1, however instead of coloring the entire tract, a graduated symbol is placed in each tract. As shown, the larger the symbol, the higher the count of burglaries in the category. In my analysis, I chose to highlight the higher levels of burglaries by also color coding the symbols. The greens represent the lower count categories, with yellow and red representing the higher count categories. This symbolization conforms to the well known red, yellow, green taxonomy with green indicating relative safety, yellow indicating caution, and red symbolizing danger. The larger more saturated red and yellow circles represent the higher burglary count areas that, from an allocation of police resources view, would be the logical tracts to assign more police patrols and resources.
In this particular example, there were a number of tracts that fell into the highest interval. However, to identify the top 20% of the tracts, a quantile, or percentile, categorization method should be used. In this type of category, the numbers of observations are broken into equal, or roughly equal, amounts. With a city-wide total of 381 tracts, this would mean that approximately 76 tracts would be in the top 20% (assuming the number of categories was 5), and there would be 38 tracts per category if one wanted to identify the top 10% of problem tracts.
Figure 4 characterizes this same data but uses proportional symbols. Each symbol is in proportion to the count of burglaries in each tract. Confirming the previous conclusions found in reviewing the figures above, the larger symbols are concentrated around the center city area and reduce in size as the tracts move closer to the suburbs. I chose to use more highly saturated blue symbols on a light yellow background to provide contrast to reduce the issues of interpreting the map by anyone who might be color blind. I've also tried to have the proportional symbols just overlap slightly, but with the scale of the map, this proved to be somewhat problematical. Additionally, because of the more highly saturated proportional symbol, once the map is posted to the Internet, the dark color tends to just run together. This is also the situation in the next figure.
Figure 5 uses a proportional symbol instead of a colored circle to represent the same data. In this case, a ski mask symbol is used to represent burglaries. Because the ski mask symbol is nearly round, the proportional sizes are virtually the same as the circles used in Figure 4. Unfortunately, because the ski mask is nearly all black, it is impossible to differentiate the different tracts when viewing the full extent of the city map. However, when one zooms in view a section of the city in more detail as in Figure 6, the symbols are much more easily read and understood and the problem tracts more easily identifiable.
The final two maps use a dot to represent the occurrence of a specific number of burglaries. Figure 7 is based upon the number of burglaries reported in 2003 with one dot representing 3 burglaries. These dots are distributed within the tracts representing the count of burglaries in each tract. In looking at this map at its full extent, the densely packed areas of dots represent the high burglary count areas. In Figure 7 these areas include the central city area, but also seem to be spreading northwest along the Interstate and spreading westward along a major US Highway. The more saturated brown dots are used on a white background, again minimize any issues with those who might be color blind.
In addition, the dots represent specific count densities even though these counts are then segregated into individual tracts. This works well enough when viewing the city at large as there are enough individual tracts to have the dots coalesce around high burglary areas. However, this kind of map would not be appropriate for viewing at the individual tract level as the dots are distributed randomly in the area of enumeration.
Finally, Figure 8 also uses the same dot per occurrence concept. In this case the dimension of time is introduced with each year of six years worth of data represented by a different color. Though the data from year to year are obviously concentrated in the central city area, it would appear that there are more red and orange dots in the more northerly areas of the city. This could indicate that while central city burglaries are still a major issue, there are a rising number of burglaries moving out towards the suburbs in more recent years.
For this figure, I again used a white background to allow the full impact of the color ramp to be viewed. I found that with a colored background, no matter how light, some dots just seemed to fade into the background. Overall, I find this type of analysis to be too busy. It would seem that there are other ways to categorize this time dimension such as calculating and mapping the percentage of change in burglaries from one year to another. This kind of analysis could suggest that while the central city continues to experience burglaries at a steady but high rate; outlying tracts may be experiencing high percentage increases in the number of burglaries from a previous year or years.
Conclusion
From this review of the various ways we have shown how a set of burglary data can be categorized, we can see that different categorization methods can answer, or at least address, different questions. Whether it’s an issue of allocation of police resources, or emerging trends over time, the GIS analyst needs to understand not just the structure of the data with which they are working, but the needs of the user of that data as well (both physically such as vision impairments, or intellectually as to the use and purpose of the analysis).
__________
Notes:
1 https://www.e-education.psu.edu/courses/geog486/L07_compiled.html Accessed February 17, 2006.
2 Ibid.
3 Ibid.
4 Ibid.