Big data, ebola data, + responsible data

This is a guest post by Willow Brugh, Community Leadership Strategist at Aspiration Tech.

For years, disaster and humanitarian response technologists, field agents, and policy makers have been pushing big data as a way to be self-reflexive and open data as a way to break down the silos rife throughout the sector. While folk tend to talk a lot about ethics and responsibility in response and data, we rarely talk about law (which is those things codified). Ebola response was one more time for organizations, funders, and those affected to wonder what role data would play in their response, and the repercussions of those choices after the response was declared complete.

The chaos of humanitarian disaster often creates an implied social license for experimentation with new approaches, under the assumption of better outcomes. Vested interests dominate the public discussion of humanitarian data modeling, downplaying the dangers of what is essentially a public experiment to combine mobile network data and social engineering algorithms. In the case of using mobile network data to track or respond to Ebola, the approaches are so new – and generally so illegal – that most advocacy focuses on securing basic access to data. Advocates for the release of CDRs often paint an optimistic picture of its potential benefits, without applying the same rigor to the risks or likelihood of harm. This trades on the social license created by disaster to experiment with the lives of those affected, under the implicit assumption that it can’t make the situation worse.

We disagree, and feel that practioners have an ethical and legal responsibility to be more cognizant and careful.

A Big Data Disaster

When Sean posted his paper Ebola: A Big Data Disaster (from which the above quote is pulled) to the Responsible Data list (and also to LibTech and a few other places), it spurred a deep conversation about what data should be shared, how practioners who are also open data advocates make hard choices in pressing circumstances, and THINGS. We were all so excited that it made sense to have a real-time conversation about the topic. The engine room, O’Reilly, and Aspiration hosted this call on March 15th about Call Detail Records being used during Ebola response. We were joined by the paper’s author Sean McDonald, Danna Ingleton and Zara Rahman of the engine room, Tim McGovern from O’Reilly, Jennifer Chan of Harvard Humanitarian Initiative and NetHope’s Ebola Response Project, John Crowley of HHI and the Humanitarian Exchange Language, Paul Currion, yours truly from Aspiration, as well as an amazing set of listeners-in who contributed questions and feedback throughout the session. You can read the full notes and additional resources on the Responsible Data wiki.

Some of the themes which stood out most to me follow.

Data sharing and anonymization

Data sharing, while the value can be easily seen by most, is organizationally and technically difficult to implement. Most of the databases and spreadsheets in use in non-crisis times are not flexible enough to also deal with the needs of an extreme event. Tools like the Humanitarian Exchange Language have started to make this easier, but are not a cure-all, as critical thought is still necessary before giving up records to another organization (or even storing it yourself!).

In addition to this, new responsibilities emerge around datasets made large from sharing and receiving, as well as unscrupulous gathering. As an awful lot of folk are starting to point out, anonymizing large datasets is nigh impossible. It gets even more complex when you can triangulate across datasets. Rather than testing out methods like k-anonymity, documenting data requests and use, and holding ourselves and others accountable, the general data sharing practice tends to be organizations which are friendly with each other slipping datasets around under the table. As a snarky aside, we tend to think only of using data for large-scale “social good” in places where there aren’t strong human rights legalities in place. For instance, Orange released data for Côte d’Ivoire and Senegal but not for France for hackathons {note: Orange did release data for France for COP21 }. Think about that for a minute. Zara certainly has. And Sean points out that this has been happening with medical treatments and training for a long time as well.

Contact tracing

This gets even more complicated when we consider that call detail records (CDRs) shared during Ebola response on the premise of contact tracing are subsequently not supposed to be anonymized. That would defeat the purpose, as contact tracing is about documenting specific people who have come in contact with each other, and therefore may have passed an infection on. And, these eggregious invasions of privacy were done while the very people who have suggested using CDRs for contact tracing have indicated it likely wouldn’t work for Ebola, while contact tracing for Ebola is in question in general, and while cell phone tower setup in West Africa means the location necessary for contact tracing is tenuous at best. And yet, in a blind hope to find anything which might help response, those concerns were tossed out the window at the long-term expense of the individuals whose records were shared.

Theory and practice

But let’s lay all that aside for a moment, and say that CDRs could do contact tracing in West Africa, and that contact tracing was a powerful tool in stopping the spread of Ebola. Then, is the sacrifice of individual privacy worth the public health benefits? This has no easy answer, and is highly contextual. Via Sean:

More importantly, it’s probably not a binary answer. Most of the governments in the world have some form of emergency powers – but it’s almost never without some form of check and balance. Critically, we also don’t have any sense of what scale of emergency is enough to trigger these powers. At present, the largest enabling factor is public fear – which means that, perversely, governments have an incentive to have a panicked/scared population, in order to exercise the broadest amount of unchecked authority.

But, as food for thoughts, during the MERS outbreak in South Korea, 17,000 people were placed under “predictive quarentine” at a $9bn recovery package (eek!) and with no transparency or accountability around who was selected or when they would be let out. That’s a power that makes me anxious deep in my activist bones and my pragmatic resource allocation brain.

The other theory at play here is that data sharing is how we break down institutional silos. And yet, as most organizations which were active in the Ebola response will tell you, this sharing of data didn’t actually help coordination, excepting in rare cases, such as GeoNode and the Humanitarian Data Exchange. These also paid particular attention to which data would be useful to responders, while still protecting the individuals whose stories the data was comprised of.

The law as an ally

But we’re not as much in the Wild West (to use an American colloquialism) as we might think, so far as data sharing and individual rights. Sean’s paper (as well as many others, as referenced in the notes) point out that useful frameworks exist, and are perfectly useful for holding bad actors accountable. Sean also points out that, whether good or bad, the laws exist – and if we don’t advocate for “good” law/frameworks, then bad law can be used to consolidate power, too.

And, as opposed to what Silicon Valley might have us believe, involving lawyers rarely slows down response efforts, AND responders are generally glad for the extra guidance. One thing we discovered together was a desire for quicker-to-understand legal frameworks, for when you’re making choices in a crisis situation.

Consent as a cure-all

We, myself included, often turn to consent as a sort of cure-all for these complications. But as Danna pointed out (as well as Linnet Taylor in this recent post responding to the same paper we discussed), consent without the ability to enforce is not enough to prevent abuse. It might be enough to create another way for people to seek redress. We should be really thoughtful before we condemn consent as flawed or eliminate things like contracts as good ways to protect our rights in relationships with private companies (which don’t have any native rights-protecting responsibilities). Some of us have worked on ways to begin this conversation through a Framework for Consent Policies from Responsible Data Forums.

Thanks to participants, practioners, and future contributors!

Major thanks to Sean for the instigation, to Tim and Zara for setting up infrastructure, for all the time so many have already spent engaging with this topic, and to the whole crew for letting me join in. Co-facilitating with so many other facilitators was a joy.

We received excellent questions and prompts from the audience which made the conversation richer and deeper. You can see/read questions (and our responses) around assessing if big data is right for you, guides and places for sharing data, and others on the call video or on the notes.

We’re excited to continue the conversation, especially around simplifying and unifying our legal questions as a first step in creating a quick-to-use legal guide and starting to lay a path towards strategic advocacy around data use and privacy.

This can help you with: Sharing Data
Issue areas: Identity, anonymity & privacy

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.

Published on: 22 Mar 2016
Discussion: Leave a comment