In this chapter we will examine the various sources that can generate metadata.
Producing metadata in-house
Metadata produced in-house by human beings is probably what most people assume when they think of a metadata source. This is metadata created manually by the specialist cataloguer, subject expert or volunteer.
This metadata need not always be created from scratch as it could be building upon legacy metadata (even if its is just a scrawled inscription on the back of a photograph, film can or audio cassette), but nonetheless is the most difficult, time-consuming and therefore expensive metadata to create. But it is also, usually, the most important - especially to the end user.
Crowdsourcing is a form of outsourcing, whereby, in the context of metadata, organisations ask their audiences to give up some free time to help to describe and comment on the resources they use. The commonest example of crowdsourced metadata takes the form of tags and keywords.
However, users can also be asked for more detailed context and back-stories related to their use and appreciation of resources. Metadata that comes from the crowd is usually added by users via comparatively straightforward web forms and tools, but more recently facets of game play have been introduced.
Game-play - also known as 'gamification' - introduces elements of competition and reward for participants, which, used effectively, has been shown to increase participation in metadata crowdsourcing and to sustain the motivation of participants.
Crowd-sourced keywords and tags
Flickr Commons is probably the best known example of tagging (sometimes also referred to as 'social tagging') carried out by the users of digital resources. It was launched in 2008 and enables cultural and heritage organisations to submit digital image collections to a centralised platform to be shared with Flickr's user community. Its use statistics are impressive.
Since its launch Flickr Commons users have added over 165,000 comments and over two million image tags.
Games can be used not only for tagging and adding classifications, but also to facilitate corrections to existing metadata, to add stories and personal experience and to provide more context to a resource. A well known example of game play for metadata is the metadatagames.org website, which provides a platform for cultural and heritage institutions to upload collections of image and time-based resources to be used in various gaming scenarios.
Benefits of crowdsourcing metadata
There are many benefits for organisations that use crowdsourced metadata. Firstly, they are able to substantially increase the numbers of resources that can be described, and therefore shared and accessed across the web. Organisations are also able to reach more people and broaden their user base because items are more extensively catalogued and take into account more user needs.
Moreover, on the back of better engagement with users, organisations are able to build new communities and foster a shared responsibility to educational resources and cultural heritage. These benefits not only help organisations achieve their goals, but also impact positively on society at large by increasing access to culture and heritage and creating new resources for research, learning and teaching.
How best to use metadata sourced from the crowd
The benefits of using crowdsourced metadata are unarguable. However there are caveats that metadata implementers should bear in mind. With a loss of professional curator-ship over metadata terminology, some checking and quality assurance mechanism will need to be applied. This is likely to be carried out in an automated way and could take the form of refusing banned or common or over-used terms. However, there are also examples of participants in crowdsourcing activities manually checking and improving on other participants' work.
There can also be concerns over the long term viability and sustainability inherent in sending collections to third party platforms – like Flickr or Metadatagames.org. This is particularly true if development ceases or takes a direction that the resource owner is not completely happy with. Organisations therefore should weigh the benefits of using a commercial service against in-house development. However it is clear the benefits of using the crowd far out-weigh any potential negatives and crowdsourcing is certainly a great way to add extra metadata to existing resources or to perhaps bulk keyword digital resources that are so large that professional in-house staff would never have time to carry out.
Deriving metadata from files and machines
File formats typically store information within the header (the first section) of the digital file. For example, if an image has been created by a digital camera, it is likely that the camera has also written a certain amount of information about the digital capture into the file header, such as the camera make and model, its settings, and the date the photograph was taken (this makes use of the EXIF standard).
Usually this metadata is technical in nature and is generally of more use to those administering the collection rather than those using it. However software packages have begun to make more use of this metadata, and it is increasingly common to be able to augment metadata stored within digital files with user added simple descriptions such as Titles, Captions, Keywords.
Metadata about the use of a particular file can also be automatically sourced from the web-server it resides on. Web-servers keep web-logs of user activity, and these logs can tell us, for example, when a file was created, its file-size or when it has been accessed. Web-logs constitute the raw data from which web analytics and statistics gathering packages derive their information. This information is indispensable in being able to gather information about the way in which resources are used, such as: how often a file has been viewed or downloaded; where it was viewed or downloaded from geographically; or the web site address the user came from to arrive at the file.