The Digging into Data Challenge is an international grant competition sponsored by four leading research agencies, JISC from the United Kingdom, the National Endowment for the Humanities (NEH) from the United States, the National Science Foundation (NSF) from the United States, and the Social Sciences and Humanities Research Council (SSHRC) from Canada. What is the "challenge" we speak of? The idea behind the Digging into Data Challenge is to answer the question "what do you do with a million books?" Or a million pages of newspaper? Or a million photographs of artwork? That is, how does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data -- far more than they could read in a lifetime -- what does that mean for research? Applicants will form international teams from at least two of the participating countries. Winning teams will receive grants from two or more of the funding agencies and, one year later, will be invited to show off their work at a special conference. Our hope is that these projects will serve as exemplars to the field.

Digging into data challenge

The Digging into Data Challenge is an international grant competition to explore the possibilities of using large-scale digitised collections in social socience and humanities research. Phase 1 (2010-11) is completed and Phase 2 (2012-14) is underway.

What is the 'challenge' we speak of?  The idea behind the Digging into Data Challenge is to answer the question "what do you do with a million books?"  Or a million pages of newspapers? Or a million photographs of artworks?  That is, how does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitised data - far more than they could read in a lifetime - what does that mean for research?  

Phase 2 of The Digging into Data Challenge is sponsored by eight leading research agencies:

  • JISC, UK
  • Arts and Humanities Research Council (AHRC), UK
  • Economic Social and Economic Research Council (ESRC), UK 
  • Netherlands Organisation for Scientific Research (NWO), Netherlands 
  • National Endowment for the Humanities (NEH), USA
  • National Science Foundation (NSF), USA
  • Institute of Museum and Library Services (IMLS), USA
  • Social Science and Humanities Research Council SSHRC, Canada.

Eight projects have been selected for the first round, and a further 14 for the second round.

Projects

First round

Structural analysis of large amounts of music information

University of Illinois, Urbana-Champaign, University of Southampton, McGill University
This project will gather around 23,000 hours of digitised music with a breathtaking range of styles, regions and time periods from a capella to zydeco, Appalachia to Zambia, and medieval to post-modern. The project will develop tools to tag and analyse the underlying structures that underpin global music.

Digging into the enlightenment: Mapping the Republic of Letters

University of Oklahoma, University of Oxford, Stanford University
This project will focus on a collection of 53,000 18th-century letters, and will extract and interpret details relating to people, places, times, and subjects, and identify new ways of visualising and annotating these relationships.

Data mining with criminal intent 

George Mason University, University of Alberta, University of Hertfordshire
This project will create an intellectual exemplar for the role of data mining in an important historical discipline – the history of crime – and illustrate how the tools of digital humanities can be used to wrest new knowledge from one of the largest humanities data sets currently available: the Old Bailey Online.

Towards dynamic variorum editions

Mount Allison University, Imperial College, London, Tufts University
This project will develop a range of tools that allow for dynamic comparison, generation of lexica, identification of topics and extraction of quotations for over 10,000 Greek and Roman texts, helping to continue developing a fundamental resource for classical studies.

Digging into image data to answer authorship related questions

Michigan State University, University of Illinois, Urbana-Champaign, University of Sheffield
This project will take three specific resources (manuscripts, maps and quilts) and develop tools to analyse and identify authorship of visual images.

Harvesting speech datasets for linguistic research on the web

McGill University, Cornell University
This project will harvest audio and transcribed data from podcasts, news broadcasts, public and educational lectures and other sources to create a massive corpus of speech. Tools will then be developed to analyse the different uses of prosody (rhythm, stress and intonation) within spoken communication. (Note: this project does not involve a UK partner, and therefore there is no JISC funding.)

Railroads and the making of modern America--Tools for spatio-temporal correlation, analysis, and visualization  

University of Portsmouth, University of Nebraska-Lincoln
This project will integrate a vast collection of textual, geographical and numerical data to allow for the visual presentation of the railroads over time, concentrating initially on the Great Plains and North East USA.

Mining a year of speech

University of Oxford, University of Pennsylvania
This project will create mechanisms to allow for the rapid and flexible access to over 9,000 hours of spoken audio files, drawn from some of the leading British and American spoken word collections.

Second round

Cascades, islands, or streams? Time, topic, and scholarly activities in humanities and social science research
University of Wolverhampton, Indiana University Bloomington, NSF, Université de Montréal, SSHRC, AHRC, ESRC

ChartEx
University of York, University of Washington, Leiden University, University of Toronto

Electronic Locator of Vertical Interval Successions (ELVIS): The first large data-driven research project on musical style
University of Aberdeen, MIT, McGill University

Imagery Lenses for Visualising Text Corpora
University of Oxford - Oxford e-Research Centre, University of Utah

Integrating Data Mining and Data Management Technologies for Scholarly Inquiry
University of Liverpool, University of California

Integrated Social History Environment for Research (ISHER)-Digging into Social Unrest
University of Manchester, University of Illinois, Tilburg University

Mining Micro data: Economic Opportunity and Spatial Mobility in UK, Canada and USA 1850-1911
University of Leicester, University of Minnesota, University of Guelph

Digging into Metadata: Enhancing Social Science and Humanities Research
University of Manchester, College of Information Science and Technology, Drexel University

Digging into Connected Repositories (DiggiCORE)

Open University, The European Library
Throughout the project, user research will inform the development process by looking into search behaviour and preferences. A prototype search interface will be developed, allowing the user to retrieve a list of resources from the different collections that have been assigned similar or related Dewey Decimal classes. Visualisation techniques will be used to display the results in ways that enhances the research process.

Digging by Debating: Linking Massive Datasets to Specific Arguments
University of East London, Indiana University

Trading Consequences
University of Edinburgh, York University (Canada)

Contact

Summary
Start date
1 January 2010
End date
31 March 2011
Funding programme
Digitisation and Content
Topic
Strategic Themes