As part of its £22m Digitisation Programme, JISC has funded the creation of a range of well used cultural and scholarly content in research, learning and teaching. The purpose of this project is to leverage this investment in those resources by using geography as a means to cross reference and unify. In this respect, geographical referencing has the capacity to provide an entry point into a wealth of other JISC digitised content in a similar way that Google is using geography as an ‘organising principle’ for its resources.

EDINA Geo-Reference Enrichment

The Project

Much digitised content is rich in geographical information (names of places, regions, and countries, plus other information such as rivers, mountains etc), whether this information is embedded in the metadata or within the digitised texts itself.

Traditionally, it has been difficult to exploit the richness of this geographical information. However, recent developments in natural language processing and a developing infrastructure for delivering such information via the web has allowed for automated identification of such information.

The project saw the JISC Data Centre at EDINA, in association with the University of Edinburgh’s Language Technology Group (LTG), enrich the geographical information held by three JISC digitisation projects, and extend the process to identify people's names. These projects jointly contain millions of words of text and implicit reference to geography via placenames.

The demonstrators built as part of this project are now available:

Project Discoveries

For Content Publishers

  • The process of georeferencing is capabale of enriching digital resources and imporving the user expereince of resource discovery via geographical terms.
  • However, the automated process of georeferencing only goes so far. Human intervention is still required to provide good results for end users.
  • The type of content enriched strongly affects the success of georeferencing, e.g. the end users ability to locate names and places.
  • Some collections are not suitable at all for such enrichment; but in the future those that are should have mandatory georeferencing
  • Use of certain gazeeteers is restricted by IPR. This means there is a balance between richness of the georeferencing versus open access

For End Users

  • Geotagging can highlight previous undiscovered connections between collections
  • Users can find locations even if they do not know precise place names
  • Users needs to be educated about unrealistic expecations about the success of geo-tagging (e.g. the ability of an interface to recall and identify 100% of place names in a collection)
  • Users can search not just by placename but but over different collections via different sorts of geographies (e.g. postcodes, counties, co-ordinates)
  • Certain types of users react very postively to map-based interfaces (e.g. 6,000 hits on maps at the Archival Sound Recordings projects)

More information is available in the project's final report. This report includes the annotation guidelines and a detailed evaluation of the georeferencing of the three projects mentioned above, plus also the georeferenceing of the Stormont Parliamentary Papers resource. It is also possible to download the original project plan

Lead site: EDINA at the University of Edinburgh

Project Partners: University of Edinburgh's Language Technology Group (LTG); History Data Service (HistPop); BOPCRIS at the University of Southampton; British Library

Documents & Multimedia

Bookmark and Share
Summary
Start date
1 January 2007
End date
1 March 2009
Funding programme
Digitisation and Content
Committees
  • JISC Content Services committee
Topic