From the code of practice for learning analytics:
- "It is vital that institutions monitor the quality, robustness and validity of their data and analytics processes in order to develop and maintain confidence in analytics and ensure it is used to the benefit of students”
Given the high risks of adverse consequences (see enabling positive interventions and minimising adverse impacts) it is essential to ensure that data and predictions derived from them are relevant and accurate.
Systems and processes for wellbeing support may use personal data in three different ways:
- First, when developing models that suggest indicators of need,
- Second, “production”, when using the models to identify which individuals may benefit from intervention
- Third, when reviewing whether the intervention processes were beneficial
At each stage accurate data is essential to reduce the risk of inappropriate interventions. Students and staff should therefore be enabled and encouraged to exercise their right to correct errors and omissions in their data, but institutions should not rely on this as the only way to ensure accuracy. Processes for obtaining and handling data should also be designed with safeguards to avoid introducing errors, and to detect those that may nonetheless arise.
Processing to develop and review models, systems and processes is vital, but must be kept separate from processing leading to interventions with individuals to ensure that, for example, validation data does not leak into the intervention process and testers are not able to identify individuals.
At each of the three stages, the processing of personal data must be minimised (ie no more than is necessary to achieve the purpose), while delivering effective results.
Development and review are likely to require a wider range of personal data than production systems. To determine effectiveness, they need historic data on the outcome of past (non-)interventions. To identify the most informative data sources, they will consider data sources and fields that are subsequently excluded from production models, for example because tests conclude that they do not make a significant contribution to alerts, or because the risk of including them is not justified by the benefit, or because their accuracy cannot be ensured, or because the required privacy notices and individual rights cannot be supported.
The greater range of data used in development and review requires particular care to be taken to minimise the risk of data processed for these purposes being linked to individuals. Synthetic, anonymous or pseudonymous data should be used wherever possible: the GDPR recognises pseudonymisation as a safeguard, but still classes pseudonyms as personal data; processes for generating anonymous or synthetic data must be reviewed periodically to ensure they remain safe.
Those developing models should be aware of, and manage, the risks that they may inadvertently reveal personal data. The ICO’s AI framework (pdf) contains more detail on privacy attacks.
Development and periodic review must ensure that models are, and remain, proportionate. They should also be checked for signs of bias or discrimination. Models must provide useful information to guide the provision of support while involving the least possible risk to individuals: both those who are identified as needing support and those who are not. Any predictive system or process will make mistakes: organisations should consider, and balance, the risk of alerting someone who did not need support, as well as failing to alert someone who did (see also Enabling Positive Interventions, below). The ICO’s AI framework contains more detail on the use of algorithmic techniques with personal data.