Summary of our discussion on the risks and mitigations of releasing data

Analysis

/ October 29, 2015

On August 26th we held our second online discussion on de-identification and anonymization – this time with the formidable data scientist Sara-Jayne Terp at the helm of the discussion. The focus of this discussion was ‘Risks & Mitigations of Releasing Data’ and this blog post is a summary of what was discussed.

Our fire-starter shared her experience…

Sara-Jayne started us out with a presentation based on the following Do No Harm principle:

“If you make a dataset public, you have a responsibility, to the best of your knowledge, skills, and advice, to do no harm to the people connected to that dataset. You balance making data available to people who can do good with it and protecting the data subjects, sources, and managers.”

But this statement is not as straightforward as we would like it to be. In the video below, Sara-Jayne describes her experience with identifying and mitigating risk related to personal identifiable information (PII) in development-related datasets. (Here is the paper that Sara-Jayne recommends at the end of this video: Myths and Fallacies about “Personal Identifiable Information”)

…and the participants shared their experiences

Participants of this discussion went on to discuss a few key challenges in mitigating risk. Their comments highlighted challenges raised in Sara-Jayne’s presentation, and raised additional questions/issues:

Participants highlighted the challenge of maintaining informed consent related to the release of data. There are situations in which data was collected for a specific purpose, but then later, the organization wants to open the data (make it public) because it can contribute to many other efforts.
Related to the challenge of maintaining informed consent, is the challenge of being able to predict future risk to the release of data. Changes in context (political or otherwise) could make data dangerous. For example, if you open up data now, how could future governments exploit the people and communities related to that data? Are there red flags or triggers that can be developed to put a system into place for addressing changes in context? Some participants shared that they address this challenge by informing people of the risk, and including the explanation that there are no guarantees that you cannot be re-identified, and that this data cannot be used in other ways in the future.
Related to both of these challenges is something that participants raised a number of times: the fear of potential risk can paralyze us. Sara-Jayne and others reiterated the point that there is an acceptable level of risk, otherwise we couldn’t do open data. The benefit has to be greater than the risk.
It’s important to keep in mind that when releasing data to a limited group of people, there is no guarantee that it won’t go out to others.
Participants agreed with Sara-Jayne’s points on the importance of having local communities involved in de-identification efforts in order to understand the context and the risks of release.
Communicating the risks of releasing data can be difficult – people or communities might chose to release data that isn’t appropriate.
Participants also raised the challenge of responsibly acquiring consent when you don’t have direct contact with the data subjects. When you are receiving data from a source about other people, how can we responsibly request/require consent info with the dataset? You want to know the context in which it was collected – but what is the best way to do this?