EDINA Geo-Reference Enrichment
The Project
Much digitised content is rich in geographical information (names of places, regions, and countries, plus other information such as rivers, mountains etc), whether this information is embedded in the metadata or within the digitised texts itself.
Traditionally, it has been difficult to exploit the richness of this geographical information. However, recent developments in natural language processing and a developing infrastructure for delivering such information via the web has allowed for automated identification of such information.
The project saw the JISC Data Centre at EDINA, in association with the University of Edinburgh’s Language Technology Group (LTG), enrich the geographical information held by three JISC digitisation projects, and extend the process to identify people's names. These projects jointly contain millions of words of text and implicit reference to geography via placenames.
The demonstrators built as part of this project are now available:
Project Discoveries
For Content Publishers
- The process of georeferencing is capabale of enriching digital resources and imporving the user expereince of resource discovery via geographical terms.
- However, the automated process of georeferencing only goes so far. Human intervention is still required to provide good results for end users.
- The type of content enriched strongly affects the success of georeferencing, e.g. the end users ability to locate names and places.
- Some collections are not suitable at all for such enrichment; but in the future those that are should have mandatory georeferencing
- Use of certain gazeeteers is restricted by IPR. This means there is a balance between richness of the georeferencing versus open access
For End Users
- Geotagging can highlight previous undiscovered connections between collections
- Users can find locations even if they do not know precise place names
- Users needs to be educated about unrealistic expecations about the success of geo-tagging (e.g. the ability of an interface to recall and identify 100% of place names in a collection)
- Users can search not just by placename but but over different collections via different sorts of geographies (e.g. postcodes, counties, co-ordinates)
- Certain types of users react very postively to map-based interfaces (e.g. 6,000 hits on maps at the Archival Sound Recordings projects)
More information is available in the project's final report. This report includes the annotation guidelines and a detailed evaluation of the georeferencing of the three projects mentioned above, plus also the georeferenceing of the Stormont Parliamentary Papers resource. It is also possible to download the original project plan
Lead site: EDINA at the University of Edinburgh
Project Partners: University of Edinburgh's Language Technology Group (LTG); History Data Service (HistPop); BOPCRIS at the University of Southampton; British Library