Summaries and recordings of discussion mini-series on de-identifying data


/ December 18, 2015

In response to requests from the community to address questions and challenges around de-identifying data, we hosted an online discussion mini-series on this topic from June to November 2015. While there is a clear understanding of the importance of de-identifying data and the possible harms that can result from questionable practices, there is also confusion and trepidation on how to actually achieve de-identification in datasets. This uncertainty is preventing people and organisations from tackling these issues head-on.

The goal of this mini-series was to create opportunities for the responsible data community of practitioners to share their experiences, knowledge and challenges with each other.  Each discussion was slightly different in format and style: one is a presentation by a de-identification expert, while another is a discussion led by practitioners in the midst of developing data-sharing policies.

The resources that came out of these discussions represent the experience and knowledge of presenters and participants. We’re thankful for the 5 presenters/facilitators (listed below) and the over 40 participants for contributing to these discussions and resources! Below is a list of what came out of each event:

  • Mark Elliot of the UK Anonymisation Network (UKAN) led an introduction for advocacy organisations. Mark’s presentation covered some of the basics: the difference between de-identification and anonymization, and the balance that needs to be struck between de-identifying data and keeping enough information in the data to make it useful for advocacy. He also introduces us to the Anonymisation Decision-Making Framework by UKAN. We have a summary and a recording of the entire conversation on our website.
  • Sara-Jayne Terp of Thoughtworks facilitated a lively discussion on risk analysis and mitigation strategies in the context of sharing data. Sara-Jayne describes her experience with identifying and mitigating risk related to personal identifiable information (PII) in development-related datasets. We have a recording of the presentation and a summary of the participant discussion on our website.
  • Max Shron of Polynumeral gave us a thorough introduction to k-anonymity and other de-identification frameworks. In this session we explored the more technical side of de-identifying data. We all came away with a better understanding of what the different frameworks are, why they matter and what the potential caveats are with each. We have a summary of the presentation and the full recording on our website.
  • Amy O’Donnell and Simone Lombardini of Oxfam shared their processes and reflections on developing a data-deposit decision-making framework. This working session was an opportunity for practitioners to share and discuss initiatives in the challenges in opening data. It was also an opportunity for Oxfam to solicit suggestions, feedback and ideas from the responsible data community of practice on their recently developed data deposit decision making framework. Mark Elliot joined us for this call to share his experience working with Oxfam and other organizations on similar decision-making frameworks. On our website, we have shared the presentation and some key questions and challenges raised during the participant discussion.

If you’re struggling with de-identifying your data, we hope these resources will be helpful. And for your questions and challenges that aren’t addressed here, reach out to the responsible data community for support.

Maya Richman

About the contributor

Maya is an interdisciplinary technologist, researcher and improvisational electronic musician based in Berlin. In 2012, she worked with Development Seed, building websites and interactive maps. Later, she worked as a research assistant for Gabriella Coleman investigating the politics of hackers, and as a radio show host for a feminist, artist-run centre. She is now working with organizations of all sizes to influence their security culture, in addition to managing and developing new internal tech processes for a distributed organization.

See Maya's Articles

Leave a Reply


Related /

/ May 17, 2019

From Consensus, to Calls to Action: Insights and Challenges From #5daysofdata

/ May 17, 2018

Why accessibility matters for responsible data: resources & readings

/ January 24, 2018

RD 101: Responsible Data Principles