Using open citation data to benefit research
When academics publish research it usually builds upon work that preceded it and where this occurs these links need to be recorded. This is done through citations, which show the links between related documents, who wrote what, where they wrote it and when.
Until recently, if you mentioned citations many people would immediately think about lists of names and publications at the end of a document and short snippets embedded within the text indicating where the connection applied. But a shift has taken place and citations have now gone far beyond being a simple link between two pieces of paper. Processes have been added, techniques refined and something new has been formed.
I’m part of our digital infrastructure team and I’m going to tell you about the effects of that citation shift and how you can use them to your advantage.
So how have they evolved and why does this matter?
‘In the beginning was the link…’ - That’s what a citation is, a relationship between two publications.
However, if we add something about the nature and the ‘direction’ of the relationship, for example document two builds upon and cites document one, we immediately have something more useful. Link to this bibliographic information and we yet again increase the potential worth of the citation.
Now, if we gather many citations together we have the beginnings of a valuable data set which can itself be studied. Let’s make this data digital and organise it so that is can be processed (searched, ordered, linked) using a computer. Finally, let’s make the aggregated data open for anyone to use. Voilà, you’ve created an open citation data set.
You may also be aware of one of the inherent difficulties with citations– that of identity. Just which ‘Anne Elk (Miss)’ was it that wrote ‘Theory on Brontosauruses’ or which ‘Dossier’ was it that generated the 45 minute claim? Fortunately there is a solution for that, unique identifiers in the form of Digital Object Identifiers (DOIs) for publications and ORCID for people. Linking our citations data with these services adds another level of utility.
There’s that ‘link’ word again. What we’re describing here is a classic Linked Open Data (LOD) scenario. In fact a citation is very similar to a simple linked data triplet (subject – relationship – object). Some of the ‘new’ uses for citation data involve the exposure of the data as LOD (the Jisc-funded Open Citation project being one of them). There is also a distinct resemblance to nanopublications.
Where does the citation data come from?
Generally the data comes from the original authors via the publishers through aggregating services such as Scopus and Web of Knowledge. The data used to be available on a subscription-only basis (it does after all take resources to gather, sort and republish and has value for the recipients). However, increasingly publishers are providing the data for nothing on the grounds that it increases the discoverability of their paid for publications. Throw open publications into the mix (with their automatically available citation data) and all of a sudden you have a whole new opportunity.
OK, you’ve got your open citation data. What‘s next?
We asked Curtis & Cartwright, an independent research and strategy consultancy, to think about this (and other open citations related matters) as part of the Jisc Digital Infrastructure Directions series of reports. Here are some of their findings:
For a start, you can analyse it
From being the by-product of research, citation data has become the object of research. Not only can you analyse it but you can now do so in ways that were, labour intensive, if not impossible before. For instance, tracking the inter-relationships between publications across disciplines, which can reveal unrealised links.
Then there’s researcher performance management and the evaluation of university wide research
Citations have become the key currency of academic reputation with indices such as h-index and Journal Impact Factor becoming increasingly important in many aspects of academic life. However, these numbers don’t tell the whole story. Theoretically, it is possible to have a high h-index through publishing a great many flawed papers that are refuted a great many times. Clearly such a person is having an influence on the community, albeit a negative one, but their h-index doesn’t show that. In addition early career researchers are automatically discriminated against because they simply haven’t had time to publish or be cited.
Simplistic measurements such as these can be applied inappropriately, especially if the source of the underlying data is lacking in transparency - For instance comparisons across disciplines where there are many different patterns of publications. With the open citation dataset linked to other datasets it is possible to overcome these flaws in the system and create metrics from the raw data. It has been suggested that citations are a better measure of the validity of a paper than that which can be afforded through simple peer-review.
Theoretically, an open dataset would allow you to build your own up/down vote or ‘like’ system as yet another alternative. Eat your heart out Facebook and Reddit.
Where do we go from here?
One of the more obvious problems with the current system relates to just what is or is not citable.
If citations are all about academic influence then it’s arguable that all media should be citable, especially in this digital age where blogs, videos and web pages have all become part of the mix. This, however, is perhaps an argument for another day. On the other hand, there are other things that can be done right now.
One of the more labour intensive aspects of citations is the initial creation and checking. There are tools that can help, but in order to be truly useful they would need to be fully integrated with existing systems. If this was to work we would first need to standardise the ways that citations are embedded in a document, and how they are codified and attached to a documents metadata – this would allow systems to talk to one another.
The more citations there are in an open database, the more useful it becomes. But as yet, not all publishers have ‘seen the light’. Although, if all research information is to eventually to become openly published, that issue at once disappears.
0 Comments