|     Home     |     Portfolio     |     Resume     |     Photos     |


Lesson 4: Design/Build GIS (Week 1)

Jim Kompanek

Introduction:

Lesson 4 is the first part of a multi-part lesson that involves designing and implementing GIS databases. Specifically, this lesson deals with the conversion of voter registration paperwork for the fictional City of Fitchburg to a personal geodatabase. The goals of this lesson are to learn the basics of rational database design, which include (Sloan 2007):

Proposed Design:

The provided voter registration paperwork contains four fields: VoterID, PIN, Ethnicity, and Party. The goal of the geodatabase is to contain all of the information available on the paper log in an efficient manner that provides for efficient queries. The database I designed contains six fields (versus the four on the paper log) which allow for queries based on location (Table 1). Each field is described below.

Table 1. Field specifications for the City of Fitchburg voter registration log.

Field Name

Data Type

Alias

Allow NULL values

Default Value

Domain

Length

VoterID

text

Voter ID

No

<blank>

<blank>

10

PIN_Map

text

Tax Parcel: Map

No

<blank>

<blank>

4

PIN_Block

text

Tax Parcel: Block

No

<blank>

<blank>

4

PIN_Lot

text

Tax Parcel: Lot

No

<blank>

<blank>

4

EthnicID

text

Ethnicity

No

<blank>

Coded

1

PartyID

text

Political Party

Yes

<blank>

Coded

1

Voter ID

The VoterID field consists of a unique number to identify each individual voter. On the supplied example of the voter registration log, the numbers range between three and five digits in length. Although this field is a number, the number acts as an identification code and is used in a manner similar to a Social Security or Driver's License number. It will not be used for any calculation. The text field is most appropriate for this scenario because it also allows for leading zeros and the potential to add letters in the future. An alias is not entirely necessary for this field because it is self-explanatory but "Voter ID" was chosen regardless. Null values are not allowed for this field because each voter is assigned a unique ID number as part of the registration process. Because each VoterID is unique, there will never be duplicate numbers assigned, and therefore, neither a Default Value or Domain will be useful. Even though the largest VoterID number assigned on the provided paperwork was five digits in length, ten was chosen as the appropriate field length. This allows the database to handle a significant population growth in the city. Because a unique number is assigned for each newly registered voter, the total number of Voter IDs will constantly rise to represent the total number of individuals that have ever been registered to vote, not just those who are currently registered.

PIN_Map, PIN_Block, PIN_Lot

For the purpose of this database, the PIN field of the paper voter registration log is broken into three separate fields: PIN_Map, PIN_Block, and PIN_Lot. The PIN field is the tax parcel ID number of the voter's residence and is organized in a trinomial manner, divided by Pin-Block-Lot (XXX-XX-X). My first inclination when designing this field was to make a single text field with enough length to handle all of the potential Pin-Block-Lot combinations. Separating the PIN into three fields allows for simpler geographic queries, such as ethnic distribution by block, or political party by map. If a single field is used for the PIN, these queries would not be possible. Text was chosen as the field type because the number is for identification purposes, not for calculations, as well as the future addition of letters in any of the fields. An appropriately descriptive alias was chosen for each of the fields. Null values are not allowed for this field, as a residence with the city is (in most cases) required to register to vote. In the case of individuals that do not live somewhere with a tax parcel ID (such as the homeless or members of the military stationed overseas), a unique code (e.g. 00-000-0) indicating different scenarios is more useful than leaving the field blank. Because there is such variation in the PIN codes, no default values are used. This also reduces the potential for data entry mistakes. Each field has a length of four to allow for future growth within the city.

EthnicID

Each voter in the city is required to report his or her ethnicity, with an option of "other" for those who are of mixed ancestry, do not fit into any of the other categories, or choose not to report their race. There are six options for ethnicity: American Indian, Asian, Black, Hispanic, White, and Other. This information is contained within the EthnicID field. To avoid redundancy and minimize use of disk storage space, a code for each ethnicity was developed. This code uses the first letter of each ethnicity (i.e., Asian = A, Black = B, etc.) and in the case of American Indian, an I was used. This system is less error prone than assigning arbitrary numbers or letters, such as A, B, C, or 1, 2, 3 (Table 2). This field is a text field, one character in length. Because of the diversity of registered voters and to minimize the risk of data entry errors, no default value are used with this field. There are two options to associate the codes (I, A, B, etc.) to their respective ethnicities; either a lookup table or a coded domain. I chose to use a coded domain for simplicity and in order to keep all of information in a single table. A downside to the coded domain is that it is not supported outside of ArcGIS (i.e. Microsoft Access) and fields cannot be hopped between with the tab button.

Table 2. Codes and Descriptions for the EthnicID and PartyID Fields.

Code Description   Code Description
I American Indian   E Extremist
A Asian   F Federalist
B Black   M Mugwumps
H Hispanic   P Purple
W White   W Whigs
O Other   O Other

PartyID

In many regards, the PartyID field (used to document which party the voter is registered to) is very similar to the EthnicID field. Including Other, there are six options for political parties (Extremist, Federalist, Mugwumps, Purple, Whigs, and Other) (see Table 2). As with the EthnicID field, coded domains are used to aid in the data entry of the political party. Again, the codes are simply the first letter of the political party within a text field one character in length. Because of the variety of options no default field is used to represent the political party. The only difference between this field and EthnicID, is that null values are allowed. This allows voters with no party affiliation to remain without a political party.

Table Design

There are four fundamentals of good database design that were used to in the process of setting up the Voter Registration database and included:

Minimal storage space:

For each field, an appropriate length was established. For the purpose of this database, the length for the VoterID was established to allow for significant, yet reasonable population growth within the city. For the various PIN fields, the field length was chosen to allow for reasonable growth in terms of maps, lots, and blocks. As for EthnicID and PartyID, a text field with a length of one was chosen. For a city with thousands of registered voters, the redundancy of a text field that contained the complete party or ethnicity would waste large amounts of disk space, as well as slow processing and loading time for the files.

Ease of data entry:

Coded domains were utilized for the EthnicID and PartyID to simplify the task of data entry. With coded domains, a pull down window with all of the party options is available for each ethnicity and political party. The use of coded domains also allowed for all of the tabular information to be contained in a single table versus multiple lookup tables.

Decrease data entry mistakes:

For all of the data fields (except PartyID), null values were not allowed. This decreases the likelihood of accidentally skipping over required fields. Default values were also not allowed in any of the fields. Because each field has no default value and null values were not allowed in the majority of fields, it would be impossible to accidentally skip over any field. As mentioned above, coded domains were used for the EthnicID and PartyID fields. This decreases the likelihood of typos, such as typing Extremits in the PartyID field, causing problems with statistical analysis of voter records.

Easy and efficient querying in ArcMap:

The PIN field was broken into three separate fields to allow for easy querying within ArcMap. This was done in order to allow querying by map, block, or lot. This task could not be easily conducted if the PIN was made up entirely of one field.

 


References Cited:

Sloan, Jim

2007 Lesson4: Design/Build GIS (Week 1).  The Pennsylvania State University World Campus Certificate Program in GIS. Accessed 6 February 2007.


This document is published in fulfillment of an assignment by a student enrolled in an educational offering of The Pennsylvania State University. The student, named above, retains all rights to the document and responsibility for its accuracy and originality.