Introduction
Digital audiovisual material is an important form of research data. It is raw material for observational and experimental analysis, for practice-based learning, and research communication.
Large-scale digitisation has opened up historical collections for academic use, alongside a massive increase in born-digital material. Media technology evolves rapidly, and today’s research projects have to plan to retain material for future repurposing, including potential re-analysis by yet-to-be-developed tools.
Digital media is technically complex and physically vulnerable to loss, as well as risks of the media or file type being made obsolescent. Even when video content is converted to digital on a hard disk, new weaknesses can emerge, such as inadequate visual information to allow analysis.
Curation of media research data should be a team effort, bringing together researchers, data or information managers and media technology specialists. Research projects making substantial use of media may find that different communities want to access and use the data throughout and beyond the project.
A particular challenge is managing the material’s transition throughout its life as research data: at first managed by the individual researchers who create or collect it, followed by collaborative use. Parts may be made available online, and possibly annotated, commented on or downloaded and re-edited by others.
Publishing media collections is best led by the researchers who created them. They will benefit from coordinated support from institutional and national data repositories in such key areas as storage management, format migration/ transcoding, metadata implementation, ethics and IPR.
Who is this for?
This guide is applicable across disciplinary boundaries, for anyone using digital media as an integral part of carrying out research.
A geneticist faces similar challenges when analysing, describing and publishing digital video as a social scientist using digital video to record an interview.
This guide identifies common challenges, regardless of discipline, and where possible directs the reader to more detailed guidance.
Putting data into a research data lifecycle
Three main phases of the research process affect the development of audiovisual research data:
- A ‘planning and piloting’ phase at the outset with development of an operational data management plan and decisions on what data will be created/ collected, on what media, file types etc. and considerations of ethics
- The main ‘project curation’ phase alongside the actual creation and collection of research data; may involve reassessment the material selected for analysis and how colleagues and peers may be involved eg, in annotating and reviewing any audiovisual material
- The ‘long-term curation’ phase to re-assess all the research data used and created, and prepare it for retention, storage and wider access, with the involvement of specialists from a data centre or repository
The research team should consider data curation at transition points between phases. Data management specialists who support the research will become progressively more involved as data is created.
With a media-based research project the research team and research data management specialists need to focus on the risk mitigation steps each party can take to make sure the media are adequately preserved for future use. There may be infrastructure available alongside the repository to help.
Tip: It is good practice to avoid the use of third-party media sharing websites, as these are usually designed for short-term social networking rather than for safeguarding research data.
How is audiovisual research data different?
As with any data, management is required. Many forms of information can represented by digital media, and media can serve diverse research purposes. For example, a digital video recording might be a record of some real-world activity, or a rendering of computational activity from a simulation model or virtual environment.
Video may be used in observational, practice-based or experimental methods, serve as a record of the research process, a means of communicating within the project team, or a vehicle for dissemination results.
The challenge for researchers is to make best use of the available research materials. In principle any reuse of curated media might involve resampling or re-editing, or adding further layers of annotation to any that have already been made.
For data specialists the challenge is to understand the reuse case; for example, does the research benefit from being able to track versions across projects, and if so how can dynamic data best be accommodated?
Our guide to managing research data links to other resources.
Take an iterative approach
Researchers working with media may find that their data management plans need regular review. Compared with textual data, digital media carries more uncertainty around the choice of file formats and standards.
It can be a richer medium; arguably it conveys more information, is open to wider interpretation, and may be re-analysed in more ways. However this potential can be difficult to assess before the material is gathered, and the copyright or privacy implications may be complex or uncertain.
Researchers may have to take a more active role in curation during their projects. Digital curation overlaps with research tasks that involve classifying and describing the data and documenting the research process.
Traditionally, data description for archiving purposes has taken place at the ‘ingest’ point, when a repository or archive negotiates with the data creator to agree on the objects to be accepted, in line with their selection policy. The nature of the files they comprise, and descriptions of their purpose and context of use are in effect ‘locked down’ for archival purposes.
Be conscious of change
Many changes will be made to multimedia data before the archive ingest phase, which may come long after the material has been created.
Audiovisual data may go through a great deal of reduction, as the research team creates clips of different levels of quality and resolution, adds labelling, metadata and links to descriptive research data such as transcripts.
Many changes can be made to the content, the related data, and probably also the format of the data to be archived. Any of these changes may impact on the future reusability of the multimedia data.
Step one: planning and piloting
At this stage the research team should define their requirements for ‘project curation’, as the digital media and related material are acquired.
The operational version of a data management plan will typically identify:
- Quality criteria for selecting data for immediate use
- Short-term storage locations and media formats
- A repository or archive service(s) for long-term curation
- How the repository selection criteria and metadata requirements will be met
- Roles and responsibilities for managing the data
- Risks for the data at each stage (see below)
What are significant properties?
Digital media has certain significant properties – essential characteristics that need to be maintained for it to have any future use. Planning and piloting will involve defining which significant properties needed to preserve the media are likely to be most important for the research.
Generally, the most important technical factors for the moving image are: the size and shape of individual images, image detail recording, the speed at which images follow each other in sequence when presented, any accompanying audio, and the overall length of the sequence.
For audio the most important technical factors are: the number of channels, playback configuration of the channels (mono, stereo, quadraphonic), frequency range of the recording, dynamic range, and running time.
Significant properties should be discussed with specialists from whichever repository will be actively involved in long-term curation. For example, the UK Data Archive has published extensive advice on suitable file formats.
The repository manager will need to assess what properties of the data will be most significant. The research team will then be better placed to decide on acquisition hardware, methods and file formats to ensure best data capture.
The requirements for recording contextual information, when and by whom, are defined once data collection starts.
Step two: deal with project curation
This phase involves reflection from putting the data management plan into practice.
Risks to the researchers’ short-term needs to work with the data may need to be reassessed. For example there may be incompatibilities between the native capture formats and the software tools used for analysis.
New developments in digital media add to the decline of previous formats, so knowing which file format and encoder are best suited for a project with specific technical, preservation and dissemination objectives can be complicated.
Storage needs should also be clarified, for the project lifetime and for long-term curation. If users of the data need to collaboratively edit the metadata or view each other’s annotations, the software used and the network infrastructure on which the data is hosted must be adequate to support this.
Multimedia information can also be stored on a medium in an uncompressed format, that is, a format in which there is a one-to-one mapping between the image and sound information and the data stored.
Metadata for audiovisual data
Metadata relevant to the curation of multimedia data can be categorised as:
- Structural metadata – describing the metadata record and its relationship to the digital file
- Descriptive metadata – descriptive information surrounding the content of the resource
- Administrative metadata – information of the resource outside of the content such as rights information and details of the analogue resource
- Technical metadata – the technical properties of the digital file
- Provenance metadata – information describing the digitisation process including any restorative action taken
Our guide to metadata offers a more detailed understanding of what metadata can do to help to administer a digital resource and make it accessible and sustainable in future.
Step three: consider long-term curation
This phase ideally begins with ingestion of the media data to a repository. This may be an institutional repository, a specialist data centre, or a national archive, or some combination of these.
The case for the material having long-term value to the research community – or to non-academic research users – needs to be set out before data can be ‘ingested’, and the data must comply with repository policies on selection and appraisal.
Whether or not the funding organisation has a policy of obliging researchers to offer their data for deposit, repositories are not obliged to accept it, and since multimedia material is typically costly to archive they are likely to be highly selective.
In it for the long term…
The ingest process will address the need to augment any metadata collected and look at future reuse needs eg, to allow online editing or annotation, and treat the result as a new derivative object to be defined and ingested.
Throughout long-term curation, the origin of the object and any trail of derivative use must remain intact. A preservation standard such as PREMIS might be used to maintain a developing provenance record.
A data preservation plan must be responsive and flexible and must therefore remain in contact with the designated user communities service requirements and developments in standards and policies. Typically, a file format that is intended for long-term preservation must retain as much useful information as possible from the original material.
Managing assets and providing access
The main aim in preserving digital to media is to allow future access and reuse. For audiovisual data this may mean reaching a compromise between file size and quality. If significant properties are preserved, then it should be possible to generate new delivery formats from archival data.
Digital media will often be stored for future broadcast, or online delivery (through download, media streaming or video podcasting). A range of delivery options are often offered together, and can help to repurpose content.
Tip: Typically, a file format that is intended for long-term preservation must retain as much useful information as possible from the original material.
Looking forward
Clipper - emerging from our research data spring project - has developed a free open source software toolset to enhance and extend the use of online time-based media by researchers. This has created new opportunities for data use, reuse and collaboration in a wide range of research scenarios.
You can explore all our guides around research and research data, stay up to date on Twitter by following #jiscrdm, or email our team directly at researchteam.futures@jisc.ac.uk.