Usage Statistics Review
For collecting usage data about publications, two basic approaches are possible. One is based on weblogfile analysis and one based on linkresolver logs. Client-side approaches like web bugs or pixel tags common in web page statistics are not sufficient for distributed publication networks where documents may exist in different versions, formats and multiple single files. Logs from repositories or journal sites, as well as from linkresolvers, can be made accessible by standard technologies using the OAI protocol for metadata harvesting and OpenURL ContextObjects. The basic architecture has been proposed by Bollen and Van de Sompel and can be expanded to include data from publisher sites available in a different XML-form called SUSHI. At the moment, however, statistical data from publisher sites conformant to the COUNTER initiative is only available at the journal title level and not at the article level. Therefore aggregating data from these sources results in a much coarser granularity.
Data aggregated from different sources has to be normalized, automated accesses from robots have to be tagged and duplicates have to be removed. This last point refers to publications and can be done based on persistent identifiers (like DOIs or URNs). In a broader context, the removal of duplicates will also have to rely on metadata-based heuristics. This means that duplicates are detected based on the comparison of ISSN-numbers, article titles or publication year or parts or combinations of these fields.
The aim of this project is to make progress towards a position wherein item-level usage statistics are comparable across a range of sources.
- Ms Christine Merk, University of Constance