Summaries and recordings of discussion mini-series on de-identifying data

Guides

/ December 18, 2015

In response to requests from the community to address questions and challenges around de-identifying data, we hosted an online discussion mini-series on this topic from June to November 2015. While there is a clear understanding of the importance of de-identifying data and the possible harms that can result from questionable practices, there is also confusion and trepidation on how to actually achieve de-identification in datasets. This uncertainty is preventing people and organisations from tackling these issues head-on.

The goal of this mini-series was to create opportunities for the responsible data community of practitioners to share their experiences, knowledge and challenges with each other. Each discussion was slightly different in format and style: one is a presentation by a de-identification expert, while another is a discussion led by practitioners in the midst of developing data-sharing policies.

The resources that came out of these discussions represent the experience and knowledge of presenters and participants. We’re thankful for the 5 presenters/facilitators (listed below) and the over 40 participants for contributing to these discussions and resources! Below is a list of what came out of each event:

Mark Elliot of the UK Anonymisation Network (UKAN) led an introduction for advocacy organisations. Mark’s presentation covered some of the basics: the difference between de-identification and anonymization, and the balance that needs to be struck between de-identifying data and keeping enough information in the data to make it useful for advocacy. He also introduces us to the Anonymisation Decision-Making Framework by UKAN. We have a summary and a recording of the entire conversation on our website.

Sara-Jayne Terp of Thoughtworks facilitated a lively discussion on risk analysis and mitigation strategies in the context of sharing data. Sara-Jayne describes her experience with identifying and mitigating risk related to personal identifiable information (PII) in development-related datasets. We have a recording of the presentation and a summary of the participant discussion on our website.

Max Shron of Polynumeral gave us a thorough introduction to k-anonymity and other de-identification frameworks. In this session we explored the more technical side of de-identifying data. We all came away with a better understanding of what the different frameworks are, why they matter and what the potential caveats are with each. We have a summary of the presentation and the full recording on our website.

Amy O’Donnell and Simone Lombardini of Oxfam shared their processes and reflections on developing a data-deposit decision-making framework. This working session was an opportunity for practitioners to share and discuss initiatives in the challenges in opening data. It was also an opportunity for Oxfam to solicit suggestions, feedback and ideas from the responsible data community of practice on their recently developed data deposit decision making framework. Mark Elliot joined us for this call to share his experience working with Oxfam and other organizations on similar decision-making frameworks. On our website, we have shared the presentation and some key questions and challenges raised during the participant discussion.

If you’re struggling with de-identifying your data, we hope these resources will be helpful. And for your questions and challenges that aren’t addressed here, reach out to the responsible data community for support.