The Web is a place where someone is always watching what you do. I understand that… but there again, the Web is such a giant metropolis, how and why would anyone notice what one individual like me is looking at and which links I’m clicking on?
Then up pops Tom Barnett, the MD of a technology company that specialises in digital publishing, to tell me that ‘Google has a file the size of an encyclopaedia on everyone in this room.’ Hmmm… I start to feel a vague sense of paranoia. Then I think… pull yourself together, Neil! Google really doesn’t care who you are. They just want to put things in your line of sight that are likely to get you to part with your wages!
These were the thoughts that occurred to me during an event called ‘Observing the Web’ organised by the Web Science Trust. The meeting included academics, industry players, technologists, funders, charities and a lawyer. It highlighted the fact that a global network of Web Observatories is emerging that will help drive new research into how people use the internet. The point of this ‘observing’ is not to take account of every little bit of information, but to understand how trends, fashions and changes of behaviour in relation to the internet might illuminate aspects of our society and culture.
This is of great interest to me and is highly relevant to some ongoing work that I am managing in my role as digital preservation programme manager at Jisc. Last year, working with the British Library and the Internet Archive, we created a large digital collection made up from snapshots of UK websites from 1996-2010. This includes all the UK websites that the Internet Archive managed to collect during that period and represents the world’s best (and in some cases the only) historic record of material that was once freely available online. This is, therefore, a valuable resource in its own right but also has a role to play in the global network of Web Observatories.
I am currently working with the British Library, the Oxford Internet Institute and the Institute of Historical Research to explore how this resource can be used for social science research. There is no shortage of ideas about what research might be carried out using the resource. One proposal suggests a study into the recent history of public health in local government, another on changes in the debate around Euro-scepticism. There are also new opportunities for using analytical methods across the archive: links between websites can reveal how online entities, such as governments, interact with other entities, such as the public. But there are also challenges: internet archives are necessarily only periodic snapshots of the web so significant gaps in the records could affect their usefulness for social science research.
It is early days for working out how we might most effectively use internet archives for research, but it certainly fits with the current trend for using big data to support decision-making and research and development. Even less clear is how we can effectively exploit academic analysis of both the historic and contemporary internet using the open, transparent and universally accessible tools and methods proposed by the Web Science Trust. Such methods contrast with the well-resourced, sophisticated and highly developed (but opaque) methods employed by the corporate observers, such as Facebook, Amazon, Google, Microsoft, Yahoo etc. All of whom have partially or entirely built global-scale businesses on the back of gathering intelligence (at gigantic scale) about how we all use the internet.
In my mind the development of an academic global network of Web Observatories begs the following questions:
- How do we enable different observatories to work together (interoperability)?
- How do we get access to data: apart from Twitter, which of the big corporate organisations will let us use their data?
- What about privacy – will people feel spied upon?
- How do we sustain web observatories for the long term?
This is a fascinating and big topic and I can’t wait to see what comes together. I would also be interested to hear other people concerns and viewpoints on this subject.
There will be more discussion at the ACM Web Science Meeting in Paris in early May 2013.
If you would like to find out more you may be interested in some previous work that Jisc funded – Researcher Engagement with Web Archives.
Follow Neil on Twitter: @neilgrindley