Activity data - delivering benefits from the data deluge
Download the report (pdf)
'Activity data' is the record of human actions in the online or physical world that can be captured by computer. The analysis of such data leading to 'actionable insights' is broadly known as 'analytics' and is part of the bigger picture of corporate business intelligence. In global settings (such as Facebook), this data can become extremely large over time – hence the nickname of 'big data' – and is therefore associated with storage and management approaches such as 'data warehousing'.
This executive overview offers higher education decision-makers an introduction to the potential of activity data – what it is, how it can contribute to mission-critical objectives – and proposes how institutions may respond to the associated opportunities and challenges. These themes and recommendations are explored in further detail in the supporting advisory paper, which draws on six institutional cases studies as well as evidence and outputs from a range of Jisc-supported projects in activity data and business intelligence.
Should we care?
In 2010, The Economist published its 'Special report on managing information: Data, data Everywhere'.
In 2011, the McKinsey Global Institute published the landmark report 'Big Data: The next frontier for innovation, competition and productivity'.
In 2012, the US Department of Education focused the data debate on the learner experience in its 'Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics' report. For the first time, in 2012, analytics featured in the 'Top 10 Issues' highlighted annually by IT leaders in US colleges and higher education.
Meanwhile a wide range of UK institutions and shared services have identified business benefits in analytics and activity data through project work in the Jisc Activity Data, Business Intelligence and Customer Relationship Management programmes.
Is this just another IT fad, driven by highly capitalised adventurers and global purveyors of online shopping and social media? Or is there something here that uniquely addresses real operational and strategic issues in the increasingly performance-driven and customer-focused business of higher education? And if so, do we have the necessary data to hand?
What is activity data?
"Our authentication logs record between 35,000 and 500,000 e-resource accesses per day."
"We have collected 3.9 million library circulation records over 15 years."
The collection and analysis of activity data is now regarded as vital to successful customer-facing businesses and has become an everyday aspect of customer experience. This is most visible in online settings where patterns of client activity make it possible for brands such as Amazon and iTunes to personalise services and to make recommendations. Meanwhile our supermarkets demonstrate the potential of a variety of data capture mechanisms to support core business processes. These include resourcing and stocking, as well as offering direct customer benefits underpinned by personal profiles captured through loyalty cards.
Detailed activity data also underpins analytics in human performance such as sport. The film Moneyball, superficially about the American love affair with baseball statistics, highlights the power of data and analysis in performance management and the search for telltale metrics that deliver 'actionable insights' to unpack the million dollar question, 'What is the problem we need to solve?' – which may not be the problem that the coach, or business analyst first thought of.
What's in it for higher education?
"If you do not use the library, you are over seven times more likely to drop out of your degree. 7.19 to be precise."
"The first year of the Early Warning System saw clean student progression from Year 2 Psychology rise from 71.8% to 85.5%."
The higher education sector is potentially in an advantageous position. A number of systems and services collect activity data, ranging from the virtual learning environment (VLE) and the library to help desks and security. However, to make sense of human activity, context is king. Thanks to detailed knowledge of each user's context, such as that held in registration and learning systems (for example level of study, course or specialism, course module choices and even performance), institutions have a powerful basis to segment, analyse and exploit low level human activity data.
Consequently, activity data can enable an institution or a service to understand and support users more effectively and to manage resources more efficiently. The following examples illustrate direct benefits:
- Student success. Patterns of student behaviour (such as VLE use, library resource access, lecture attendance) may help identify students at risk of performing poorly or dropping out, thereby generating early warnings and enabling timely interventions to increase success. This is illustrated in case studies from Huddersfield and Roehampton universities
- Learner experience. Using activity patterns to recommend resources of particular relevance in the individual's context (taking account of course, unit and even physical location) will accord with student expectations of a quality online experience; the same techniques can also benefit researchers. This is illustrated in case studies from the University of Huddersfield and the Open University
- Resource management. Analysing how resources are being used, managed and curated should enable library and learning resource services to budget more economically and to resource and purchase more effectively. This is illustrated in the case study from the University of Pennsylvania
- Activity data may also serve the institution and its clients more broadly, especially when combined with operational data from related systems:
- User behaviour. Insights derived from activity patterns may lead to efficiencies ranging from optimisation of marketing campaigns to enhancement of campus services and IT workflows. Examples are found in a number of Jisc Customer Relationship Management and Activity Data projects
- Data mining. Exploratory collation and examination of data from multiple institutional systems and above-campus sources (such as UCAS) can aid discovery of new narratives and identify actionable insights. This is illustrated in case studies from Cornell, Michigan State and Roehampton universities
Strategic institutional response
A group of institutional projects funded by Jisc in 2011 indicated the potential importance of activity data in learning and teaching, in supporting research and in resource management. The projects' findings cover areas ranging from learner success to service impact and resource utilisation, and from library recommendations to research dissemination. They chime strongly with projects working in the broader area of institutional business intelligence. The underlying messages are increasingly clear:
Key themes and findings
Users will increasingly expect services to be enhanced through the use of such intelligence
Analytics for Learning and Teaching
Successful businesses will increasingly use activity data and analytical techniques, generating real-time feedback loops
Industry forecasts, including the McKinsey 'Big Data' report
Most higher education business and educational processes are enabled by IT systems that already collect or could be collecting such data
JISC-CETIS Analytics for the Whole Institution; Balancing Strategy and Tactics'
Previously anticipated inhibitors such as privacy and data protection can be appropriately addressed through clear governance
Legal, Risk and Ethical Aspects of Analytics in Higher Education
Act for the long term by establishing policies, consolidating systems of record, developing skills and activating data collection
Cornell and Michigan State case studies
Recognise the low hanging fruit – minimal compute, easily understood applications relating to core business, such as student early warning systems
Huddersfield and Roehampton case studies
The starting points and cycles for exploiting activity data will vary in different areas of the business, so a single corporate business intelligence implementation is unlikely to be sufficient
Pennsylvania case study
Focus on strategic direction that maximises common coding frames, and therefore the potential for data integration and reconciliation, is critical
Michigan State and Roehampton case studies
The intelligence to be gained increases as more data is accrued day by day and year on year, so collection should start now even if analysis is deferred
Huddersfield case study
The benefits of 'learning from doing' are at the heart of assimilating activity data and analytics more broadly
Activity Data programme project experiences
Service directors are recommended to prioritise identification, collection and preservation of activity data. They should:
- Establish data governance. Central authority is required to clarify legal and ethical principles and to drive essential data compatibilities (based on indicators such as course)
- Activate collection. Data collection capabilities should be activated for existing systems
- Harvest across systems. Making connections between activity data sources is a key consideration and merging data provokes fresh insights and cultivates new approaches
- Include activity data in system requirements. New implementations should include accessible activity data logging and harmonisation of identifiers
- Acquire skills. Key skills and practices need to be developed in new types of storage, data reconciliation, visualisation and, not least, business-led analysis
However, experience in education and beyond indicates that efficient and effective exploitation of activity data requires institutional leaders to reconcile a range of local and tactical starting points with strategic development of the institutional analytics mission.
- Exploitation of activity data to address immediate operational challenges is likely to differ from one area to another (from student retention to estates management to budgetary control) and the journey will necessarily involve data experimentation and incremental development of practice
- At the same time, the institution needs to learn from the local in order to develop the enterprise-wide foundations for efficient, effective and malleable data collection and analysis underpinned by responsible policy, data standardisation, reliable tools and professional skills
Our case studies and institutional project experiences strongly suggest that institutions need to combine and cross-fertilise enterprise-wide and local (top down and bottom up) approaches in order to maximise the opportunities presented by analytics. As illustrated, this implies working towards a shared corporate approach, rather than imposing a single monolithic solution.
This section explores the issues raised in the executive overview in more detail, in particular the motivations, benefits and challenges of adopting activity data. These are supported by case studies from six UK and US universities operating in a variety of settings. It considers what strategic problems or critical issues analytics can help address and therefore what might be the institution's initial focus for analytics.
What is activity data?
It also signposts guidance and exemplars of the strategies and skills required to realise the potential benefits in exploiting user activity data which are:
The benefits of activity data
- Internal efficiency and effectiveness of corporate business processes; assessment and prediction of returns on investment
- Outward-looking achievement of institutional mission; quality of services, particularly in learning and teaching; student experience and outcomes; wider business and community impact
Whilst principally addressing learning and teaching activity, including the use of libraries, learning resources and business intelligence from associated systems, the guidance can be applied generally across a range of institutional systems.
Responsible exploitation of activity data is closely aligned with the mission of further and higher education. It can make a tangible contribution to both corporate and individual good, by enabling an institution or service to understand and support users more effectively and to manage resources more efficiently and responsively.
- The timing is good for institutional take-up of these opportunities. Arguably, the business implications of doing nothing are unthinkable.
- Students will increasingly expect data to be used to their benefit, to enhance their learning experience and their chances of success
- Marketing and resource management and other financial imperatives demand highly responsive data-driven indicators
- Technologies, tools and practices have developed rapidly in response to consumer and social internet activity
- Whilst we should be wary of examples from global 'web-scale' online services, other sectors, ranging from retail to healthcare to sport, have demonstrated the business case for working with activity data.
2 The information explosion and the analysis gap
There is an exciting yet threatening intensification of the intelligence potentially available to businesses and institutions arising from digitally mediated interactions – in terms of volume ('big data'), variety (multiple connectable sources) and velocity (frequency of capture). The scale of opportunity challenges our ability to accumulate (store), to analyse (process) and to assimilate (present) using traditional means.
It is essential to determine how this ballooning dysfunction can be addressed not only to benefit higher education institutions but also to deliver the sort of personalised and responsive user experience that has become an expectation of life online.
3 Operational targets
Central to the mission and everyday business of post-compulsory education, there is clear potential for analytics in enabling managers to derive and act upon pre-emptive indicators and 'actionable insights' gained from activity data and associated analytics.
It is important to be grounded in the business of the institution, avoiding the temptation to focus on IT infrastructure as a means to get going with analytics. While tools are necessary, they are available in abundance and therefore represent a second order problem. The starting point is to ask questions about how analytics can improve efficiency or effectiveness. Consider three levels of practical impact:
Assessing Performance – efficiencies, economies and effectiveness:
- Student success and satisfaction
- Research productivity
- Staff performance management
- Alignment of enterprise resources with mission and customer interests
- Monitoring other indicators such as brand recognition and reputation
Informing audience segmentation and targeting:
- Pricing and other value offers
- Applications processing
- Course design
- Learning style
- Personal interventions to support students, researchers and staff
Identifying trends and business models:
- Student marketplace
- Research landscape
- Resources procurement
- New products and services
Of course, these groupings should not be regarded as silos. Activity data accumulated in one context may be used to inform decisions and interventions elsewhere. For example, student indicators that provide early warning 'signals' could also inform choices (right course, level, study mode, learning style), potentially adding value to applications and offer processes.
Early targets for analytics leading to tangible benefits might involve:
- Student lifecycle, especially retention and achievement
- Personalisation of learning services, including resource recommenders
- Library and resource management
- Operational, specialist and personal IT – availability, performance
- Marketing and surveys – satisfaction, reputation
- Campus services improvement, notably retail and catering
4 Analytics involves everyone
Consultation undertaken by EDUCAUSE in developing its analytics maturity model has confirmed that an effective analytics implementation requires the buy-in, skills and ongoing attention of a significant cross-section of the higher education workforce. Our illustration, based on potential criterion statements in the EDUCAUSE model, highlights the range of actors engaged in maximising the opportunity – senior leaders, administration, faculty, IT professionals and business/domain experts. Not least, it will be essential that students and researchers recognise, as they may habitually do in social networks and online purchasing, that the institutional practice benefits them.
Our Open University case study illustrates something of the range of interest and ownership that might typically become associated with the potential of activity data. The Recommendations Improve the Search Experience (RISE) project originated in the library and consulted with learners to establish the value of the available activity data as the basis for a resource recommendation service. RISE also engaged with corporate developments led by the pro-vice chancellor for teaching and learning, as well as offering value to the library itself in informing resource management.
The Open University – enhancing the student experience
The quality of the student learning experience has never been more important. Therefore the Open University wanted to understand whether activity data generated by over 100,000 unique users of online resources each year could be used to provide useful recommendations along the lines of 'students on my course are looking at these resources'. The vital starting point and catalyst for analytical thinking was to evaluate the readily available activity data as the key to being able to draw in contextual data from other local and sector sources.
The RISE project has helped to identify how data from the library plays into the world of institutional data warehousing, business intelligence and learning analytics. This development also illustrates the potential of using activity data in a way that goes beyond its use as a business intelligence tool and contributes to an enhanced learner experience by providing real time personalised feedback.
Read the Open University case study in full
However, we should not be put off by the potential breadth and depth of such undertakings. Implementations can initially be bounded by business unit or domain (for example, resource efficiency in the library, student retention in a particular faculty), as illustrated in our Roehampton University case study.
5 One size fits all?
It is evident that different business functions may require differing approaches to analytics, especially in the cycles of data collection, analysis and enactment. It is important to understand the interactions and feedback loops required to support different aspects of a business.
This is as true in higher education as it is in any consumer-facing service. Currently progress in learning analytics and early warning indicators appears particularly encouraging. A recent EDUCAUSE statement suggests that 'the strategic value of analytics includes improved institutional productivity, decision-making, and learner success. While better decision-making about campus operations is an important aspect of analytics, its potential to support students is of paramount interest'.
As highlighted in our case study from Michigan State University, learners are likely to benefit from (and to expect) close to 'real time' feedback. This might take the form of timely guidance or automated just-in-time resource recommendations (eg 'Students on your course who accessed this e-resource also accessed that'). Contrastingly, finance teams may require data and analysis on a periodic basis (eg daily, weekly, monthly, depending on function).
Michigan State University – shaping up for real time analytics
The key to the institutional analytics mission is quality of data. You cannot derive valid actionable insights if the data are unreliable. Our emphasis is therefore on establishing 'systems of record' that ensure that key shared data items used by many different systems – such as names and addresses – are entered and managed in one place.
To date, we have been principally focused on developing sound analytics capabilities around finance and human resources systems. However, we are attentive to a new wave of developments in the area of learning support, focusing 'in time' personalised support at the point of need. We need to remember, however, that developing a set of useful learning analytics tools isn't enough. To be truly effective they need to be embedded in effective overall pedagogic designs, along with having effective means of intervention when students are discovered to be struggling. Furthermore it is essential that students perceive this greater degree of active measurement of their work as helpful and not as an invasion of their privacy.
Read the Michigan State University case study in full
Doug Clow of the Open University illustrates such business differences in speed (velocity) and scale (volume) in his feedback model.
It is therefore essential that analytics strategies should recognise these varying cycles, cadences and business motivations, whilst they may ultimately seek integration of a large variety of data.
6 Analytics is a journey
The value of institutional strategy and centralised underpinnings of standardisation across data sources is emphasised in our Cornell and Michigan State University case studies. Nevertheless the realisation of benefits from analytics is necessarily incremental and iterative and is unlikely to be a straight line from a perceived problem to an evidence-led solution – just like any research process.
This is illustrated in the iterative cycle from 'mission' to 'enactment'. That cycle should be expected to generate new insights into the narratives and benefits that might be derived from the accumulation of data – thus leading to a refinement or expansion of the analytics mission.
"The origins and the outcomes of analytics are the most important - those being determining the strategic questions to which data can be applied and then using the results to make improvements."
For example, the University of Huddersfield started its activity data journey with a library-centred mission relating to resource utilisation and recommendation – beneficial in themselves, however, the data narrative unfolded to identify the potential for the same data to provide early warning indicators of student achievement.
Cornell University – setting the records straight
Determining the place to start with embedding analytics in corporate practice is a key issue for any institution, raising challenges relating to governance, coordination and culture. In order to catalyse our institutional requirement to develop analytics capabilities, we set ourselves the target for 2012–2013 of refining existing administrative and financial dashboards to better meet the needs of our trustees. This has allowed us to take some mission critical steps that will benefit our subsequent development of analytics in other areas – to identify systems and data elements of record, to engage data owners as stewards and co-contributors and to agree data governance principles.
Learning analytics do not yet represent such an immediate priority, partially on account of the tradition and style of the institution. However, that situation is changing rapidly as new generations of students come through and there is little doubt that expectations about feedback will be changing.
We are only at the beginning of a journey towards mutual understanding of our data and embedding analytics in institutional practice. However we're confident that our starting point will create a solid foundation. We're also clear that the journey is essential.
Read the Cornell University case study in full
We have reflected little about technology in this advisory paper. This space has historically suffered from conflating the benefits of analytics with the implementation of enterprise-scale data warehousing solutions or some other technology panacea. There is no doubt that data organisation, accumulation and preservation to meet unpredictable objectives is an important focus of analytics that is well served by enterprise scale data platforms. However, a consistent theme of this paper is that 'for the journey' there are a variety of starting points and means of getting there.
- Familiar desktop applications. Because related skills are highly developed and the tools are widely licensed, there is a tendency to shoehorn storage, analysis and presentation in to desktop tools, notably spreadsheets. These may represent a realistic starting point for early investigations but run out of steam at scale. The challenge is to know when to facilitate change to more appropriate platforms, as illustrated in the University of Pennsylvania Libraries case study
- Open source tools. The IT community has spawned a wide range of open source and low cost tools that are particularly suited to the type and scale of activity data (for example NoSQL databases, indexing engines and visualisation processors). Furthermore some tools, such as Google Refine, have a low entry threshold. Such tools will be popular amongst the technically adept for developing agile experimental approaches, which are very valuable in this space. The challenge will be ensuring the necessary standardisation of data formats and coding frames to ensure long term value above and beyond the originating application
- Vertical product extensions. Education takes advantage of some substantial vertical applications in areas including financials, admissions, course management and libraries. In most cases, the vendors are keen to demonstrate analytical opportunities by introducing new features or integrating partner applications within their particular application 'silo'. This is a notable tendency in the VLE market. The challenge is to weigh the benefit of such 'off the shelf' solutions against the likelihood that key indicators will be derived by combining data from multiple systems, as illustrated in the Huddersfield, Pennsylvania and Roehampton university case studies
- Above-campus services. Significant activity data and wider analytics datasets are being compiled above-campus – by shared services (for example, the Journal Usage Statistics Portal or the OpenURL Router), by sector-related agencies (notably the Higher Education Statistics Agency and UCAS) and also by central and local government (such as demographic and employment data). Each source will present its own challenges in terms of data licensing, data protection and technical integration. It is therefore important that institutions work together through networks such as the Universities and Colleges Information Systems Association (UCISA) to access these datasets for local analytical purposes and to develop any technical tools on a once-for-all basis
- Enterprise Solutions. Finally there is the range of enterprise level business intelligence products available from suppliers such as IBM and Oracle. These include data warehouse and archiving solutions, geared for tight integration into enterprise architecture, plus dashboard products for providing the necessary variety of reporting views. Such products are likely to be expected by managers and practitioners in areas such as finance. However, there is real danger that the institutional strategy is driven by such implementations. Strong governance of the analytics mission is therefore required to balance the pressures for long term enterprise solutions with the potential benefits of local innovation
The development of Metridoc has been a direct response of the University of Pennsylvania libraries team to diversity of data sources and of analytical purposes, balancing flexibility with sustainability. Metridoc is a software framework for ingesting activity data from multiple sources and transforming it in compatible formats that can feed standard reporting and visualisation tools. It takes the burden away from local transaction systems (such as library, course and student management) whilst offering considerable flexibility for experimentation.
University of Pennsylvania – implementing scalable and sustainable processes
Metridoc has been developed by Penn Libraries to provide an extensible framework that supports library assessment and analytics, using a wide variety of activity data collected from heterogeneous sources. The framework supports the integration of activity data across audiences, systems and services with the potential to act as an ingestion and data curation engine for activity metrics anywhere in the institution. Data points are currently derived from fund accounting systems, discovery tool logs, publisher reports and authentication logs. These are only the beginning of potential targets, which Penn is gradually expanding, guided by management and planning needs.
This approach eases the burden of generating business intelligence by providing scalable and sustainable aggregation processes and a normalised data repository. The approach isolates extraction, transformation and repository services from the functions of specialist IT systems. This architecture reduces the cost and complexity of start up, and also allows for creation of new data collectors with manageable technical effort.
Read the University of Pennsylvania case study in full
8 Balancing approaches
There are tensions between different approaches to collecting, connecting and analysing activity data (or any other analytics data sources). These might be broadly described as 'top down' and 'bottom up'.
Typical positions are caricatured in our illustration – the adventurer (collect the data together and, using readily available tools, let it tell its stories), the action type (act now to leverage existing activity data to address priority issues such as student retention) and the forest guide (data can plays tricks and waste time, so start by rigorously identifying significant indicators).
The nature of the opportunity suggests that a higher education business should seek to harness all these types and their approaches – perhaps creating an analytics 'black belt team' that can seize an immediate opportunity (eg student library experience), using it to validate the significant indicators (library turnstiles tell us nothing) and to look for other hidden narratives (early warnings of student success).
"If we are serious about analytics we need to be thinking about both exploratory data analysis and confirmatory data analysis."
Our case studies from Huddersfield, Roehampton and the Open University reflect a fruitful dynamic tension between answering known questions and discovering new narratives, Bottom-up data narratives draw on the potential for data capture rather than being driven down from established KPIs. Narrative 'connectors' will include student and staff cards/institutional IDs, places, IP addresses and times.
Furthermore, as discovered by both Huddersfield and Roehampton universities in the case of library turnstile data, it is only through experimental collection and statistical analysis of a range of possible indicators that we can establish those of genuine significance and thereafter refine our data collection strategies.
The University of Huddersfield have been bold explorers and have engaged six other universities in that approach in the library impact data project. Their ideas were developed from investigating the narratives waiting to be disclosed by over ten years of library circulation data. This catalysed not only new approaches to the student resource discovery experience but also insights into possible indicators of student success, as the potential of the data revealed itself.
A basket of indicators highlighted that Huddersfield students who do not use the library are more than seven times more likely to drop out of their degree…7.19 times to be precise.
University of Huddersfield – exploiting library impact data
The library impact data project (LIDP) tested the hypothesis that 'there is a statistically significant correlation across a number of universities between library activity data and student attainment'. The project used the student's final grade, course title and variables relating to library usage: books borrowed, library e-resources access and entry to the library. At the end of the project the hypothesis was successfully supported for e-resources and books borrowed for all eight institutional partners.
The second phase of the project investigated possible causal aspects. We can now say that, based on those dropping out in the third term of study, if students do not use e-resources they are over seven times more likely to drop out of their degree. Although we cannot say that non-usage causes students to drop out, it could be used as an early warning system. If students are not using the library's e-resources, it is likely to be worth checking to make sure all is well. These data could be exploited to concentrate staff resources at points of need, and to examine whether students from different backgrounds have different needs when it comes to library and learning content services.
Read the University of Huddersfield case study in full
Roehampton University was driven by an urgent requirement for action, for mechanisms to address the specific challenges of low student retention and progression as highlighted in psychology undergraduate programmes.
The university's response was to trial an early warning system with a view to identifying and refining advance indicators. The first year of operation saw clean student progression to Year 2 Psychology rise from 71.8% to 85.5%.
Roehampton University – early warnings about students and data
Roehampton University is using a variety of activity data to support student progression and retention, especially addressing causes of non-completion. An 'early warning system' approach to collate key activity data and flag up students who were at risk of failing to progress was trialled in the Department of Psychology. The indicators ranged from poor lecture attendance to receiving a fail grade or a warning for plagiarism. In its first year of operation, the system saw student progression rates improve by 14% amongst first year psychology undergraduates.
Subsequent developments under the Jisc Customer Relationship Management progamme helped the university to take on key lessons about developing economic and effective analytics capabilities – especially about the structural issues of leveraging data across systems. Informed decisions about what sources provide a relevant and sufficient set of data for analytics purposes are crucial. Meanwhile, the benefits of these early efforts to cohere data about student activity are clear and therefore the student performance system is being deployed for the start of academic year 2012–2013.
Read the Roehampton University case study in full
9 Measuring progress
We have presented the adoption of analytics as an iterative process where questions and analysis not only lead to enactment and but also open up possibilities for refinement and reveal new insights. We have also recognised that this demands an organisational culture based on clear governance and shared data standards, whilst encouraging local innovation and initiative. We might describe these requisite factors in terms of readiness or maturity.
In developing assessment criteria for an analytics maturity model, EDUCAUSE has identified factors impacting and indicating organisational progress. The key factors are summarised as follows:
Measure of maturity
Buy-in and understanding is required across the range of faculty, service and management levels
What are the cultural issues around buy-in and adoption of analytics?
Clear data policies and responsibilities will ensure legal and ethical obligations are fulfilled
What institutional policies need to be updated or put in place in response to analytics initiatives?
Process must support identification of target outcomes, moving from what the data says to making decisions and enacting change
What are the barriers to evidence-based decision making?
The outputs can only be as good as the data so the right data needs to be captured in a clean and consistent form
What needs to be done locally and/or centrally to ensure data can be exploited to address real business challenges?
Infrastructure and tools
Capability and capacity is required for storage, processing, analysis, reporting and visualisation
Are there barriers in terms of availability of or access to the necessary hardware and software?
Staff development must compensate for the shortages of professional analysts across all industries expected to continue for some years
What are the existing technical and business capabilities (central and local) and what is the role of the IT service in analytics?
The critical enabling investment will be in practical business-facing initiatives underpinned by clear governance and skills development
How is analytics investment approved and appraised? Does the process enable development of local/central synergies?
This type of framework provides a useful checklist for assessing progress and setting targets. Such vigilance and self-assessment is especially important for the analytics mission, which is necessarily complex and malleable simply because it seeks to dig deeper than previously possible into the composition of the organisation and its processes and into the behaviours of its ecosystem.
Appendix: Case studies
The Open University – Enhancing the student experience
Richard Nurse, Head of Digital Services Development
The quality of the student learning experience has never been more important. For libraries and for resource discovery in general, the quality of the online experience is vital, particularly for a distance-learning institution such as the Open University, where online presence forms a significant element of a student's engagement with the library. In the online world student expectations are driven by the experiences offered by such as Amazon and Facebook, so libraries need to demonstrate that their systems can provide a comparable experience, including the real time interactions offering connections and making recommendations.
As highlighted in the National Student Survey, improving the search experience is therefore a key priority. Therefore alongside recent investment in a new Discovery system from EBSCO, the OU wanted to understand whether activity data generated by over 100,000 unique users of online resources each year could be used to provide useful recommendations along the lines of 'students on my course are looking at these resources'. We also needed to see how users would react to such a service.
The vital starting point and catalyst for analytical thinking was to evaluate the data readily available data in our activity data sets as the key to being able to draw in contextual data from other local and sector sources.
The RISE (Recommendations Improve the Search Experience) project (2011) identified data from the EZProxy server log files as the key dataset. EZProxy is used by Library Services to authorise remote users to access library resources as if they were on campus. Log files were collected for several months to give a set of data large enough to be able to make sensible recommendations. The project also created a relevance ranking algorithm and a search interface using the EBSCO Discovery Search interface (API) to provide a context for recommendations.
After loading the EZProxy logfiles into a MySQL database, we added contextual data such as courses of study for each transaction. The EBSCO API was used to retrieve bibliographic metadata via Crossref that could be stored locally within the recommendations database to provide consistent user-friendly article titles.
In our case, the available activity data enabled three types of recommendation to be created:
- Course based - Users on my course accessed …
- Enquiry based - Users of this or a similar search term accessed …
- Expansion based - Users of this item also accessed related items …
User reactions were interesting, recognising the value of the recommendations that were generated, yet wanting more precise information about their provenance. Undergraduate students, for example, liked the concept of recommendations from their peers but wanted to know that they were coming from high achieving students.
Following the RISE experimentation, the OU has continued to provide a Google Gadget version of the search that includes recommendations and has used the knowledge gained in new versions of mobile search. The RISE concepts are also being embedded into follow-on search developments.
With significant institutional interest in Learning Analytics led by the PVC for Teaching and Learning, RISE has helped to identify how data from the Library plays into the world of institutional data warehousing, business intelligence and learning analytics. At a time where data about student and course performance is being scrutinised like never before it is allowing data about use of Library resources to be part of the solution. To ensure that the approach could be widely rolled out, RISE also investigated aspects of privacy and licensing.
This development also illustrates the potential of using activity data in a way that goes beyond its use as a business intelligence tool and starts to drive an enhanced learner experience by providing 'real time' feedback through personalised resource recommendations.
Find out more about the Open University RISE project
Michigan State University – Shaping up for real time analytics
David Gift (Chief Information Officer)
Michigan State University is a large Research 1 institution serving 48,000 US and international students, nearly all of whom are involved in forms of online learning. MSU offers 200 academic degrees, and is organized into approximately 400 distinct academic and support/business units; notably, the university has functions that operate across as many as 15 business sectors.
The key to coherence of the institutional analytics mission, wherever you start in terms of priority applications or methods, is quality of data. You cannot derive valid actionable insights if the data are unreliable, or if you don't establish a consistent and consistently applied set of data definitions and meanings, and primary data sources.
The variety of our business and academic specialties provides a good example of why the university needs to balance local specialized systems needed to meet specific sector needs against the benefits of standardization and shared information. Looking to analytics, therefore, our emphasis is on establishing 'systems of record' that ensure that key shared data items used by many different systems – such as names and addresses – are entered and managed in one place and then made available to many systems through data warehouses. This promises a range of benefits:
- Essential consistency underpins downstream analysis
- Transactional systems need not be queried by analytics processes and tools
- However, analytical datasets can be extracted and warehoused with solid links back to their primary sources in transactional systems
- Efficiency paybacks in terms of managing the lynchpin data in one place, and always in the best functional place
To date, we have been principally focused on developing sound analytics capabilities around finance and human resources systems. However, we are attentive to a new wave of developments in the area of learning support. Developments such as Purdue's Course Signals and the work of Virginia Tech in Mathematics have illustrated the value of real time data in focusing 'in time' personalized learner support at the point of need, with the added benefit of signaling opportunities for longer term course improvement.
Traditionally and generally our academic progress measures are read and reacted to on the cycles of semesters or terms (i.e., at the ends of courses) and therefore have not been required to develop this sort of 'real time' responsiveness. Learning analytics ups the ante in that respect and should become valuable to students at institutions like MSU in enhancing early (Year 1 and Year 2) academic success on which their overall academic success is built.
Faculty and students at MSU using the LON-CAPA course management system already have many years of experience with in-course indicators such as student attention to and success at working problem sets delivered online. We need to expand this sort of in-course analysis across a broader array of courses and tools, and a larger set of metrics.
We need to remember, however, that developing a set of useful learning analytics tools isn't enough - to be truly effective they need to be embedded in effective overall pedagogic designs, along with having effective means of intervention when students are discovered to be struggling. These two additional factors represent innovations needed in instruction as much as do the learning analytics methods themselves. So a great deal of work is required with faculty to make the best use of these concepts.
Furthermore it is essential that students perceive this greater degree of active measurement of their work as helpful and not as an invasion of their privacy. It may be possible that increasing use of social networking tools in the online aspects of courses could expand the means to help students help themselves, through active student-to-student assistance and via recommender services linked to key resources.
Cornell – Setting the records straight
Ted Dodds (CIO) and Sasja M. Huijts (Director, Planning & Program Management)
Cornell University is a private Research 1 institution in New York State with a campus-based population of approximately 14,000 undergraduate students and 7,000 postgraduates.
Determining the place to start with embedding analytics in corporate practice is a key issue for any institution. Regardless of any infrastructure already in place, such as our Oracle-based data warehousing facilities, the challenges largely relate to governance, coordination and culture.
In order to catalyse our institutional requirement to develop analytics capabilities, we set ourselves the target for 2012-13 of refining existing dashboards to better meet the needs of our Trustees. At present, the primary scope of these dashboards is high level administrative and financial indicators, not student progress or research performance. This focused commitment has allowed us to take some mission critical steps that will benefit our subsequent development of analytics in other areas.
- To identify systems and data elements of record; for example, for personal IDs and course IDs
- To engage data owners as stewards and co-contributors within the wider organization
- To agree data governance principles, bearing in mind that we are a distributed organization and we do not want to stifle local innovation
To establish a taxonomy of identifiers and terms, and the associated rules and caveats; it is important to recognise that 'givens' at a local level can be sources of confusion and conflicting interpretation at corporate level – they can vary from one domain to another. Consider for example the definition of an FTE.
Learning analytics do not yet represent such an immediate priority, partially on account of the tradition and style of the institution. However, that situation is changing rapidly, and learning analytics is an area of increasing interest. At Cornell we are accustomed to high retention and completion, linked to robust face-to-face support systems suited to our campus setting and staffing model. There is therefore a presumption that this highly personalized system works well for the vast majority and identifies exceptions well. However, as new generations of students come through, there is little doubt that expectations about feedback are changing – so, whilst presently faculty members do not generally see technology as the key, there may be new considerations on the horizon.
The role of analytics in research will be a focus of our forthcoming corporate plan. The approach will need to take account of three factors specific to research:
- Detailed analytical data will differ from discipline to discipline
- Much research is undertaken in cross-institutional and international partnerships
- National funding agencies, such as NIH and NSF, are in a strong position in terms of authority standardization and to establish the requirements and drive the collection of datasets
We also recognise that, whilst some areas involve personal or institutionally sensitive data, others will benefit from similar approaches. Enterprise IT services are one example. They operate in a fast moving sector, involving diverse approaches that can potentially lead to the loss of strategic perspective amidst a stream of tactical imperatives. We want to know where and how these services are meeting expectations, and where they are not. IT benchmarking, including the sorts of KPIs and maturity assessments that can be incorporated in to corporate analytics, is therefore of real value.
The same applies to emerging areas involving IT, such as the development of analytics itself, so we strongly welcome the EDUCAUSE development of an analytics maturity model. Now is the time to share practice and benchmarks, not downstream when the majority of senior managers have become seasoned adopters.
Finally, we're very clear at Cornell that we're only at the beginning of a journey towards mutual understanding of our data and embedding analytics in institutional practice. However we're confident that our starting point will create a solid foundation. We're also clear that the journey is essential.
University of Pennsylvania Libraries – Implementing scalable and sustainable processes
Joe Zucca, Director for Planning and Organizational Analysis
Metridoc is an extensible framework that supports library assessment and analytics, using a wide variety of activity data collected from heterogeneous sources. Data points are derived from fund accounting systems, discovery tool logs, publisher COUNTER reports, resource sharing systems, and authentication logs. But these sources are only the beginning of potential MetriDoc targets, which Penn is gradually expanding, guided by management and planning needs. In the near term, MetriDoc will consume book circulation data and research consultation/library instruction inputs. As Penn brings on new or replacement technology to support user services, the framework will expand to incorporate additional activity targets.
Business intelligence gives managers greater insight into the use and effectiveness of services, but the relevant data can be difficult to aggregate, or lack common or normal structures. In addition, data sources often encode valuable information, such as user credentials, in formats that are challenging to decode and represent categorically. Metridoc eases the burden of generating business intelligence by providing scalable and sustainable aggregation processes and a normalized data repository. The architecture reduces the cost and complexity of startup, and also allows for creation of new data collectors with manageable technical effort.
Data collectors can gather data asynchronously from a system's log files or in real time. The data undergo transformations to meet analytical requirements, delivering essential consistency and anonymity; for example MetriDoc can reach out to authoritative sources to transform digital object identifiers into citation elements or exchange user IDs for demographic markers. Once transformed, the data are loaded into repositories for dissemination as dashboards or data sets, or for additional processing by ad hoc query or visualization tools.
A good illustration of MetriDoc functionality involves the processing of transaction data from ILLIAD, a system used to manage resource sharing among libraries from user request to final book return. The system tracks workflow, recording the source of requests (for example the opac), and relevant date and time milestones for lending and borrowing, bibliographic information, fulfillment histories, patron IDs, and geographical information about lender networks. The source data comprise, in short, a trove of information about service performance, business process, reader interest, collection assets, and potential partnership opportunities arising from complementary collection needs.
In a batch process, MetriDoc queries the ILLIAD production system each night to extract a highly granular payload of data. Data subject to transformation, such as user credentials, are parsed, resolved and anonymized, and the additional data elements are loaded and linked to the transaction extract in a MetriDoc database. The system generates a detailed dashboard of production information for resource sharing managers and is available for query by collection analysts and strategic planners who have highly specific intelligence interests. An example of the ILLIAD dashboard is viewable at http://metridoc.library.upenn.edu/metridoc-penn-illiad . Additionally, the data repository supports potential linkages with other transaction data keyed to user demographics or to bibliographic data points; for example joining inter-lending with general collection use or potentially electronic resource use.
Thus, MetriDoc supports the integration of activity data across audiences, systems and services with the potential to act as an ingestion and data curation engine for activity metrics anywhere in the institution. The target sources may be databases, server logs, spreadsheets, or data manually submitted by staff in the course of service interactions. The approach isolates extraction, transformation, and repository services from the functions of specialist IT systems. It also supports a range of options, either locally created or commercially supplied, for accessing and delivering near real-time analysis to the desktop.
Find out more about MetriDoc including technical information
University of Huddersfield – Exploiting Library Impact Data
Graham Stone, Library e-Resources Manager
In 2011 the Library Impact Data Project (LIDP) aimed to support the hypothesis that 'there is a statistically significant correlation across a number of universities between library activity data and student attainment.'
The baseline project involved eight institutions in 2011: University of Bradford; De Montfort University; University of Exeter; University of Huddersfield; University of Lincoln; Liverpool John Moores University; University of Salford; Teesside University.
The project used the student's final grade (as a class), the course title and several variables relating to library usage:
- Number of books borrowed
- Number of times library e-resources were accessed
- Number of times each student entered the library.
At the end of the project the hypothesis was successfully supported for e-resources and books borrowed for all eight institutional partners. A complete set of anonymised data was released as part of the dissemination and the project was mentioned in SCONUL's response to the Higher Education White Paper "Higher Education: Students at the Heart of the System".
The second phase of the project (ending October 2012) investigated possible causal aspects. This involves enriched data such as demographic information and student's final grade (as a percentage, rather than a class) to provide better management information. This has resulted in some interesting findings that are worthy of further investigation.
We have looked in detail at retention, and we can now say that, based on those dropping out in the third term of study, if students do not use e-resources they are over seven times more likely to drop out of their degree. When looking at PDF downloads alone they are nearly eight times more likely to drop out. Although we cannot say that non-usage causes students to drop out, it could be used as an early warning system. If students are not using the library's e-resources, it is likely to be worth checking to make sure all is well. Not using the library does not automatically mean you are at risk of dropping out, but library usage data could be another tool within the armoury of student support services, one part of a complex picture which helps them to understand which students might be at risk of failing to complete their degree.
Other findings show different levels of use by gender, ethnicity, disability and country of domicile. These findings represent actionable insights of a very practical nature. The data could be exploited to concentrate staff resources (both library and other student support) at points of need, and to examine whether students from different backgrounds have different needs when it comes to library and learning content services.
There is now significant interest in the findings of the Library Impact Data Project from a number of UK Universities, as well as institutions in the United States and Australia. This shows a great deal of potential for further analysis and perhaps collaboration on a national (through a shared service) or international level to gain a greater understanding of usage data for business intelligence and ultimately to enhance the student experience and improve attainment. To this end the LIDP is undertaking a feasibility study looking into the appetite for a shared service for library usage analytics.
Further information of the Library Impact Data Project
Roehampton University – Early Warnings about students and data
Dr. John King, Senior Project Manager, IT Services
Roehampton is using student activity data from a variety of sources to support student progression and retention, especially addressing causes of non-completion. The approach was trialed in the Department of Psychology (2010-11), delivering significant early benefits, leading to an extended approach being developed and tested in the Jisc-funded fulCRM project (2011-12).
In summer 2010, Psychology undergraduate exam boards raised issues about the number of students failing to progress onto the second year. Whilst concerns about individual students were being noted by staff, in many cases it was not until the student presented a poor profile at the first year board that the issues resulted in letters of warning by which time the student may have become in danger of dropping out.
The decision was taken in the department to implement a student Early Warning System (EWS) which would centrally collate key activity data and flag up students early in the first term who were at risk of failing to progress. The system was designed to perform proactively as a safety net for students struggling to engage with the programme and to offer them extra help and support. The list of indicators includes:
- Poor Lecture Attendance (3 missed lectures)
- Missing Scheduled Personal Tutorial or other staff appointment
- Failure to complete initial formative tasks and other formative tasks
- Failure to hand in coursework or engage in required group work activities
- Receiving a fail grade
- Receiving a warning for plagiarism
- Unacceptable behaviour in lecture
In its first year of operation, the EWS system saw student progression rates improve by 14% amongst first year Psychology undergraduates.
Meanwhile, a university wide review of school working practices identified particular concern with inconsistencies in the award of deferrals across combined programmes, requiring independent and consistent evaluation of a student's mitigating circumstances. New centralised procedures, including an online deferrals application, were implemented during 2010-11.
These encouraging developments highlighted the potential to link these datasets and to consider other sources that could be mined for a contribution to the emerging objective of a "Student Performance System", for example:
- Library Turnstile Security System
- Virtual Learning Environment
- Electronic Attendance Monitoring
- Welfare Data
Subsequent developments as part of the Jisc CRM progamme helped the university to take on a number of key lessons about developing economic and effective analytics capabilities - especially about the structural issues of leveraging data across systems, notably the problematic nature of data built to serve other purposes and the cost and complexity of reformatting it to be effective and useful in an entirely different context.
Informed decisions about what sources (i.e. IT systems) provide a relevant and sufficient set of data for analytics purposes are crucial. Data may exist but it will not necessarily serve you well and therefore the design of potential analytics should consider:
- Authoritative common identifiers – can you link the data sources together?
- Extent of use – are the target segments (e.g. schools) using the system?
- Significance – do the indicators available contribute differentiation to the analysis?
- Complexity – is the data difficult to extract and systematize?
For example, it was originally thought that data from Library Turnstiles could provide significant information to enhance EWS. However, this indicator proved to offer a low degree of differentiation and was dropped in favour of student activity indicators from the Moodle VLE, where the frequency with which students entered module sites and the associated dwell times and resource use were very closely correlated to the incidence of students at risk as identified by tutors.
It was also realized that welfare data was very undefined and would be difficult to integrate into the envisaged Student Performance System.
In terms of authoritative identifiers, analysis showed that "Department" is a crucial entity within the processing for Mitigating Circumstances; unfortunately the Department coding held for modules, although adequate for use within the Student Record System, was not consistent enough to be used for mapping in this context.
So the big lesson to be learned was that no matter how simple the idea seems and how relevant to the issue in hand (in this case, retention and progression), the data to supply this knowledge is frequently complex, inconsistently connected and requires manipulation to make it useful.
These are key lessons to be applied strategically as the university progresses in its use of analytics. Meanwhile, the benefits of these early efforts to cohere data about student activity are clear and therefore the Student Performance System will be deployed for the start of academic year 2012-13.
Read more about the fulCRM project
Authors: David Kay, Sero Consulting Ltd and Mark van Harmelen, Hedtek Ltd.
Please cite this report as:
Jisc (2012) Activity Data: Delivering benefits from the data deluge.