
| Introduction | Digitizing | Metadata | Results |
Jim Kompanek
Digitizing:
The process of digitizing each Sanborn map consisted of several precise, time consuming, procedures. The accuracy of each step was dependent on the accuracy of the previous action.
Orthophoto Preparation:
Prior to georeferencing and digitizing the Sanborns, it was necessary to define the projection of the DOQ. Based on the similar the coordinates between the DOQ and the DRG, the DOQ appeared to be already projected in UTM; the projection was simply not defined. When placed in the same layer of ArcMap, the DOQ was approximately 200 m off mark. This accounted for the difference between NAD 27 and NAD 83. When the projection of the DOQ was defined as UTM Zone 17 NAD 83, the two layers aligned perfectly. The orthophoto was used as the basis of all georeferencing, as it was more accurate and of higher resolution than the DRG. Individual structures could be zoomed in much greater detail on the DOQ.
Georeferencing Steps:
The first step involved defining the projection of each Sanborn tile as UTM Zone 17N NAD 83. The was performed via ArcToolbox in the Spatial Reference properties tab. NAD 83 was chosen as the datum because it matched that of the DOQ and because it is inherently more accurate than NAD 27.
The DOQ and DRG were both added to a new ArcMap session. The data frame properties were examined to ensure map projection was based on the DOQ (UTM Zone 17N NAD 83). On each Sanborn tile, road intersections near the corners of each map were zoomed in on until each became pixilated (with the aid of the magnifier window). The Add Control Points tool was used to place a control point on the intersection.
The DRG was then zoomed in on using the Zoom to Layer option in the Table of Contents. The road intersection identified above was identified on the DRG and zoomed in on approximately 1:2000 scale. The DOQ was then activated and magnified with the magnifying window at 200 percent. With the Add Control Points tool, the intersection was reselected.
This procedure was followed for the remaining corners of each Sanborn. The quality of each Sanborn was visually checked against the orthophoto in the background. The streets generally lined up along the border of the Sanborns. The next step involved setting the transparency of the Sanborn layers to 35 percent. When this layer is placed on top of the DOQ, it was checked to verify that internal streets correctly lined up against the aerial photograph. The final step involved updating the georeferencing. An affine transformation was used to minimize the amount of distortion on each Sanborn map. Following the georeferencing of each map, individual structures were then traced using the editor toolbar.
Data Export:
For the purpose of backward compatibility, the client requested shapefiles with geographic coordinates of each layer created. The first step to creating a usable file for older versions of ArcView was to export the feature classes. To do this within ArcMap, each class was selected by Data ---> Export and then Output --->Shapefile. Because the data was in UTM NAD 83, it needed to be reprojected. Within ArcToolbox, it was necessary to select Data Management Tools--->Projections and Transformations--->Feature Tools--->Project. The final step was to select a Geographic output coordinate system with the same datum (NAD 83).
Prevention of Data Entry Error:
The most important step to preventing data entry error, on all of the layers, was selecting categories under the symbology tab of each layer. This allowed each feature to update itself as the data was entered into each table. Spatial data was generally created first and then attributes were filled. As a unique value was selected for each attribute (whether it be street name, building use, etc.) the symbol changed as the cell in the table changed from <null> to its proper name. As with the symbology, labels also played an important role—As information was entered into each table (such as Struc_ID), the features updated themselves on screen—This simplified the process of data entry.
With each feature class, each attribute table field was sorted, to visually check for outliers. The most common mistake I found involved spaces before and after entries, such as “ Main” or “Main “ instead of “Main”. In certain fields, Coded Domains were used to limit the number of options and to decrease the likelihood of mistakes. Although it was successful in cutting down on mistakes, the inability to “tab” through fields in the attribute tables made the data entry process tedious and frustrating. Snapping greatly aided the digitizing of the roads and structure layers. When digitizing the roads, snapping was set to “end”, which allowed each road segment to be accurately connected and prevented under/over shoots. In the structures layer, snapping was generally set to “end” and “vertex”. This prevented overlapping polygons. This was especially necessary along the commercial stretch of Main St.
Database Design Challenges and Corrections:
Several unforeseen challenges were encountered during the digitalization process. Most notably, many structures were not a consistent height throughout. Many structures also indicated multiple stories but did not indicate a division line. Some buildings were also listed in ½ story increments. Especially on commercial buildings, individual structures often had a range of addresses, with no clear internal divisions. Some structures were assigned an address, with a secondary address in parenthesis. There were many outbuildings along the alleys, which were often unclear which structure they were associated with. Several buildings also shared common staircases. In terms of land use, many buildings were listed as vacant, or did not have a use listed. Common sense had to be used to classify each feature when it did not fit into the framework of the designed database.
Many of the procedures I used to prevent data entry errors were also used to check the integrity of data. Appropriately adjusting Symbology Categories for the layer properties proved to be most useful. For the roads layer, the geography (North East/North/North West, etc.) was adjusted to see if it was consistent across the data frame. The same process was used with road names to ensure there were no mistakes. In the structure polygon layer, the same process was used to look for outliers. Buildings of like land use were generally clustered together, when a structure stuck out, it was verified against the Sanborn. After data was entered, tables were also sorted to look for outliers and <null> values.
After georeferencing the Sanborns, the edges of each map was checked against each other, as well as the DOQQ. Each Sanborn was also set to 35% transparent and overlaid on the DOQQ. After each structure was added to the polygon layer, it was also overlaid on the DOQQ. The building outlines generally fit over the aerial photograph. The roads layer was also overlaid the DOQQ to verify that the roads corresponded to the intact roads. The accuracy of the western portion of the project area was a greater cause for concern, as many of the streets indicated on the Sanborn, either no longer exist or were significantly altered. The magnifier window was also used to maximize the accuracy of georeferencing and digitizing features. After a certain point, zooming in resulted in diminishing returns as the raster DOQQ became very pixilated.
The most important tool for the digitizing process was the snapping tool. As all of the structures were rectangular, it was important to select “Vertex” and “Edge” on snapping options for the Sanborn polygon feature. Snapping was also important to ensure that the line segments correctly align in the geocoding road polyline feature. This was accomplished by selecting “End” in the snapping options. On the polygon layer, several structures were located within close proximity of each other (but not touching), it was important to turn snapping off when this scenario was encountered. One potential problem noticed while georeferencing the Sanborn was that not all of the structures were the same story throughout.
Inherent Database Error
A degree of acceptable database error resulted from the methodology used to digitize the Sanborn maps. When the Sanborns were initially georeferenced, some tiles had less acceptable control points, due to significant changes in the urban environment. The quality of the georeferencing was diminished in these areas as control points could not be verified. Higher resolution orthographic maps, as well as field verification of extant structures may have provided a higher degree of precisions. When the individual structures were traced, a degree of subjectivity was required for ambiguous portions of lines. In regards to attribute data, much of the data on the Sanborns was difficult to read and did not fit tightly into the framework designed from the beginning. It is recommended that problems identified above be resolved prior to implementation of the geodatabase on a large scale. It is also recommended that all attribute information available on the Sanborn be recorded on the database, as it will be quicker and more effective to deal with the additional fields at this stage than to go back through every decade. Given the purpose of this project, the level of error in this database is acceptable.
Time Estimates
Each Sanborn tile took approximately 3 hours to digitize. This included georeferencing each map, digitizing each feature class, and populating the attribute fields. Approximately 50 percent of the structures present on each tile were digitized. Assuming that much of the work to digitize the remaining structures would be redundant, it would likely take 5 hours to complete each tile. Assuming that city did not grow and there is a consistent number of buildings on each tile, this time estimate may be applied to all decades. There were approximately 30 tiles present on the 1920 Sanborn map and ten decades. As a result, the final product would require 300 tiles to be georeferenced and digitized for a total time requirement of 1500 person hours. It is likely that the city did not remain a consistent size during the past century. The time estimate may be significantly off mark if the city grew considerably since the 1920's.
This document is published in fulfillment of an assignment by a student enrolled in an educational offering of The Pennsylvania State University. The student, named above, retains all rights to the document and responsibility for its accuracy and originality.