Previous Page Next Page

The diversity of data

What value might a social science or humanities researcher find in a million poems or a million newspaper pages? And how might society benefit? What potential do huge repositories of digitised data hold for this kind of research? That's the question at the heart of the Digging into Data challenge and it's resulting in some unusual and exciting collaborations.

Take the Oxford-Utah/poets-technologists project where poets and computer scientists came together... It doesn't sound like the most likely or fruitful partnership but, while there were certainly some communication issues along the way, the project has produced a new tool that's delighting poets and technologists alike.

poem viewer

The Poem Viewer is a web-based tool that enables users to explore and analyse poetry through a visual representation of different elements of a text – the words of the poem, the phonetic sounds and the location of the sound. On the site you can examine a somewhat eclectic selection of pre-visualised texts (from ‘Mary Had a Little Lamb’ to a Nature article on ‘myocardial infarction in mice’) or upload your own choice of poem via a simple interface. The programme will automatically transcribe the poem into its phonetic sounds and then reveal a visualisation that demonstrates, at a glance, where in the mouth the word being spoken is located. It helps poetry scholars to compare poems to see how sound operates differently, for instance between a lyrical poem that is very much based on sound and one that places more emphasis on the power of the images in it. Researchers are also interested in understanding whether sound behaves differently in poetry than in novels or scientific writing. 

Katharine Coles, poet and professor at the University of Utah, is convinced that the project has come up with something that enhances poetry scholars' essential skill of close reading of a text: “This is something really different. It's growing out of the poets' and poetry scholars' fundamental interest in text and interacting with the text in a very intimate kind of way.”

“We discovered early on that we and the computer scientists have very different approaches to the material. They are much more systematic than we are and they really want to be able to pin things down - and it's not the instinct of the poet to want to pin things down. We want to open things up.”
Katharine Coles
Poet and professor at the University of Utah
 

As well as different approaches to the material, the team discovered differences in their use of language. Computer scientists need each word to mean one thing whereas poets rely on the fact that words can have a multiplicity of meanings.

“What do poets mean by 'time'…?” asks Min Chen, professor of scientific visualisation and the project manager. “I'm a computer scientist and when the poets talked about their language…well, for quite a while we had trouble understanding what they meant by things like 'time'.”

However, ultimately, the project has created a tool of value not just to poets but, potentially, across the humanities. Martin Wynne, linguist and self-confessed ‘interpreter’ (between the poets and the scientists) of the project explains: “I've often thought of visualisations as a way of disseminating the results of a project – a nice picture to put on a poster as part of a presentation. But the tools that we've developed are much more embedded as part of the research process, part of the ways of exploring poems, of teaching poems and I think that's a possibility that people haven't thought about very much. I think a long way beyond poetry, across the humanities, people might be able to have a look at this and see how visualisation might help them in their research and teaching.” It might, for example, help to make the structure of a poem – or even another kind of text such as a political manifesto - easier to understand and therefore to compare different texts, at a glance.

Digging into data challenge

This collaboration between Oxford University's e-research centre and the University of Utah is just one of a number of projects exploring new research techniques and uses of big data under the Digging into Data challenge. Now in its third round, the challenge is funded by eight leading international research agencies from the UK, USA, Canada and the Netherlands. Funders include NEH, NSF, SSHRC, Jisc, IMLS, AHRC, ESRC, NWO, CFI and NSERC. The projects are not only using huge datasets and complex subjects to explore new frontiers in research, but they are also forging international partnerships and new relationships between traditional scholarship and cutting edge computer science. There are fourteen projects running as part of the current phase of the programme.

Among these:

Digging by Debating is examining the data provided by hundreds of thousands (and eventually millions) of digitised full-text books, bibliographic databases of journal articles and comprehensive reference works to uncover and represent their argumentative structure. Researchers can then analyse how they relate to each other, and how the ideas in fiction, philosophy, history, and the sciences may have an impact on developments in other fields. Eventually, by digging into texts on a massive scale, it will be possible to map the hotspots - when and where, for example, science and philosophy have particularly influenced one another.

ChartEx

ChartEx is exploring the full text content of medieval charters from the 12th to the 16th Centuries and creating a 'virtual workbench' to allow historians to access the information extracted and add further information and comments.

Trading Consequences is using text mining software to explore thousands of pages of historical documents relating to trade in the British Empire during the 19th Century to help answer a key question: What were the economic and environmental consequences of commodity trading between 1800 and 1914?

ELVIS is investigating European polyphonic music from 1300 to 1900 to understand changes in style over that period.

Add to that ISHER, providing social historians and social scientists with the means to detect and associate events, trends, people, organisations, and other entities of specific interest to social historians, related to social unrest and Mining Microdata, using data-mining technology to exploit one of the largest population databases in the world, a vast collection of 19th and early 20th Century census microdata from Britain, Canada, and the United States, to track social mobility over two generations in three countries – and the scale and ambition of the projects becomes clear.

“Dealing with big digital data is not just a humanities issue and it is not just an American issue or British issue or European issue. It's worldwide and we need to work at the large scale,” argues Brett Bobley, CIO of the National Endowment for the Humanities (NEH), one of the Digging into Data funders.

“We’re discovering research questions that we didn’t have when we started off.”
Peter Ainsworth
University of Sheffield
 

As the world becomes increasingly digital and more and more big data sources become available, new techniques are needed to search, analyse and understand these materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st-Century scholarship.

“We’re discovering research questions that we didn’t have when we started off,” says Peter Ainsworth, who led one of the teams on the Digging into Image Data to Answer Authorship-Related Questions (comparing manuscripts, maps and quilts across four centuries) project.

One Culture

While there are clearly benefits for the research and humanities communities involved, the impact and benefits of this kind of experimental work reaches much further. As the authors of the CLIR report, One Culture: Computationally intensive research in the humanities and social sciences put it, the sort of computer intensive research in the humanities and social sciences we are now seeing is taking us into a new era - one with the promise of revelatory explorations of our cultural heritage that will lead us to new insights and knowledge, and to a more nuanced and expansive understanding of the human condition.

Find out more about Poem Viewer

info

More info…

Read our press release on the launch of the Digging into Data Challenge

Contact Stuart Dempster, director strategic content alliance, with any questions regarding the Digging into Data Challenge

 

 

 
info

You might like…

If you liked this article you might also find these of interest:

We’ve got lots of advice on managing and curating your research data


Take a look at our Infokit on Virtual Research Environments for advice on fostering collaborations across traditional academic boundaries


Read our previous article on opening up big data


A recent article in Inform highlighting the changing face of scholarly collaboration


Read our blog on how Jisc help you use data in the future?

 
bubble

Your thoughts…

What are your thoughts on the topics covered by this article, we’d love to know…

comments

 
^ Back to Top