Summary of the RDF working session on developing a data-deposit decision-making framework

Analysis

/ December 10, 2015

How can we share and open data while respecting the privacy of those whom data is about? What policies and procedures should organisations have in place to guide staff on good practice? On November 19, we set out to discuss this question through an RDF working session on developing a data-deposit decision-making framework with Oxfam. This working session was an opportunity for practitioners to share and discuss initiatives in the challenges in opening data. It was also an opportunity for Oxfam to solicit suggestions, feedback and ideas from the responsible data community of practice on their recently developed data deposit decision making framework. Mark Elliot joined us for this call to share his experience working with Oxfam and other organizations on similar decision-making frameworks.

Oxfam’s data-deposit decision-making framework is a tool to help their staff know when, how and where to share data within the organization and externally. This framework outlines how Oxfam classifies data that will and won’t be shared, and helps to balance potential good with potential risk. Oxfam developed this working with RDF and the UK Anonymity Network before identifying the UK Data Service as a safe platform to share data. You can read more about the process so far on the Oxfam Policy & Practice blog.

Amy O’Donnell and Simone Lombardini presented a brief overview of the draft policy and then highlighted a few questions they are still working out:

Some of the challenges that Oxfam is struggling with regarding this framework, include:

Difficulties in keeping up-to-date on new data and security standards, as software changes so quickly.
How to address the dilemma of deleting data (especially when you might need it for follow ups). While you may not need to collect personally identifiable information (PII) for the first round data you collect, it is important to have contact details when you need to revisit the same people for comparison purposes.
It’s quite clear when a risk is considered “low” or “high”, but how do we distinguish medium? There are times when the dataset is complex and potentially sensitive, but the probability of re-identification is low. How do we categorize this?

Key Takeaways

You can manage risk of re-identification in ways that don’t include de-identifying data. These datasets have quite sensitive information in them but the probability of re-identification is extremely low because the connection between researchers (who are accessing the data) and the data subjects is very not proximate. In this sense, the risks are managed by the secure settings that establish a distance between those two stakeholders.
The way we describe the consent process is a critical and often missing step in obtaining informed consent. One participant asked, “…anonymisation issues are pretty complicated. What level of knowledge do you need to get across to communities for them to be truly *informed*?” Mark responded by explaining that 95% of what is called “informed consent” is not fully informed. It’s not about giving people more info – it’s about how to present the information in an attractive and engaging way. In that sense, it’s a communications question, rather than an ethical question. One idea that was shared is to use a video that could serve as the front-end of a consent process (here’s an example of the kind of video being described, developed by the Administrative Data Research Network). Engaging people in this way helps them understand what you’re talking about and allows them to give an informed decision.
During the development of responsible data guidance, it is important to engage and consult those who will ultimately be responsible for carrying out best practice. These may be your staff, interns, consultants, and local partners. Processes like this need to be tangible and actionable to help staff to make informed decisions. How do we work with partners to enforce these ideals especially when they might be less well resourced? Furthermore, typical audiences of data tend to be researchers and those looking for it, so how do we engage other audiences and make data appealing and accessible?
Perhaps the strongest takeaway from this conversation was the importance of sharing and testing ideas. Often, we are tempted to wait until we feel like we have a “gold standard” to share – but realistically, how could it be a gold standard without input from a large community of practitioners? How do you know the areas that are underdeveloped without getting that new perspective? We all acknowledge that it can be intimidating to share these kinds of resources, but the more we share and provide feedback, the more comfortable it will be and the more we can mutually benefit as we grapple with challenges in this space.

We’re so grateful to Amy, Simone and Mark for demonstrating this practice of sharing, and we look forward to hosting other opportunities like this within the responsible data community!