Digging by Debating: Linking massive datasets to specific arguments
Summary
How can we more closely and meaningfully link digitised texts to argumentative reasoning and inquiry about these texts?
Huge numbers of digitized documents are now accessible via the World Wide Web. Computerized analysis of this material helps to reveal large-scale trends in collaboration, cocitation, and word usage, that has enabled information scientists to identify emerging research fields as well as changing expertise profiles at the level of individuals, institutions,
and entire populations of researchers. Scholars and students in the humanities often find these analyses intriguing, but the detailed interpretative work and critical engagement with texts that is the hallmark of traditional humanities scholarship is still not directly supported by available tools. Furthermore, end users of humanities scholarship have few means for browsing, navigating and interacting with these datasets in ways that make sense to them.
High-level maps and taxonomies of ideas suggest connections among and between people, concepts, and documents, but current techniques do little to identify the nature of those connections. For example: Does author A cite author B and mention concept C because A is being critical or supportive of B, and because A is accepting or rejecting C? Current techniques for large-scale data analysis do not answer such questions.
We will develop and implement a multi-scale workbench, called "InterDebates", with the goal of digging into data provided by hundreds of thousands (and eventually millions) of digitized full-text books, bibliographic databases of journal articles, and comprehensive reference works written by experts. Our hypotheses are: that detailed and identifiable
arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and, that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better-informed critical debates among students and the general public.
The research and development is being conducted through an international collaboration between UK and US teams (given above).
Objectives
A key challenge tackled by this project is to uncover and represent the argumentative structure of digitized documents. Users will be able to see the semantic landscape of books and articles, to zoom into specific topic areas, and to use cutting-edge interpretive techniques to perform linguistic analyses of the raw text. Arguments and debates expressed in these texts can be connected to and can serve to anchor online discussions that form a part of the Argument Web, an emerging environment supporting millions of concurrent arguments and billions of argument resources.
Anticipated Outputs and Outcomes
These are currently being clarified, but will include:
-
The InterDebates prototype tool
-
Evaluation of InterDebates in the exemplar domain of the History and Philosophy of Science, using the HathiTrust data-set.
Project Staff
Project Manager
Prof Andrew Ravenscroft
University of East London, CASS School of Education and Communities
a.ravenscroft@uel.ac.uk
Project Team
Prof Chris Reed
University of Dundee, School of Computing,
chris@computing.dundee.ac.uk
John Lawrence
University of Dundee, School of Computing
johnlawrence@computing.dundee.ac.uk
Dr David Bourget
University of London, Institute of Philosophy, School of Advanced Studies
root@dbourget.com
US Team:
Prof Colin Allen
Indiana University,
colallen@indiana.edu
Prof Katy Borner
Indiana University, School of Library and Information Studies (SLIS)
katy@indiana.edu