Placename Ambiguity Resolution |
Geographical data is often a key interest as people care about things related to the places they live, work, or travel. Interests as disparate as the local news, regional wars, or cultural festivals are informed by location. In fact, it has been estimated that at least 70% of textual documents contain references to geographic locations (Hill's Georeferencing Book). Moreover, the individuals and the voices of particular places reflect local values and opinions. Indeed some modern applications are even able to use maps to show the mood all over the world (e.g. twitter mood). However, unlike geo-tagged multimedia resources (e.g. photos and videos), geographical information mentioned in textual documents is often ambiguous due to the flexibility of natural language.
The problem tried to solve here falls in the domain of GIS. Specifically, given a word (e.g. Washington) in a document, we want to know whether it is a person (e.g. the president)or a place (WA state or DC area). To solve this problem, I developed SPOT (Sensing Places Out of Text), a hybrid geocoding algorithm that integrates different techniques from earlier work.
Evaluation for such kind of algorithms is difficult due to the lack of human geo-annotated collections. Thus, I developed a platform for human geo-annotation and conducted an evaluation based on more than 1000 travel blogs annotated by 191 human annotators.
A documentation management system adopts SPOT can visualize either a single document or a whole document set (e.g. all documents that mention places in a nearby region).
Patent
System and Method for Mapping Text Phrases to Geographical Locations. Docket No. 20101692-US-NP, Attorney Docket No. 022.1114.US.UTL