Issues and potential processes for releasing crowdsourced data

/ February 4, 2015

Sara Terp, Director of Data Projects at Ushahidi, at

‘[Ushahidi is] thinking about what it means to balance the potential social good of wider dataset release with the potential risks that come with making any data public…

Any of the datasets managed by Ushahidi users contain information that is personal, often gathered under extreme circumstances, and potentially dangerous to its subjects, collectors, or managers. Sharing data from these platforms isn’t just about clicking on a share button. If you make a dataset public, you have a responsibility, to the best of your knowledge, skills, and advice, to do no harm to the people connected to that dataset….

Make things betterAs a crisismapper, I often go through the ethical process. I generally do a manual investigation first, or supervise someone who already has access to the deployment dataset doing this, with them weeding out all the obvious personally identifiable information (PII) and worrisome data points, then ask someone local to the deployment area to do a manual investigation for problems that aren’t obvious to people outside the area (for example, in Homs, the location of a bakery was dangerous information to release, because of targeted bombing).

Some of the things I look for on a first pass include:

  1. Identification of reports and subjects: Phone numbers, email addresses, names, personal addresses.
  2. Military information: actions, activities, equipment.
  3. Uncorroborated crime reports: violence, corruption etc that aren’t also supported by local media reports.
  4. Inflammatory statements (these might re-ignite local tensions).
  5. Veracity: Are these reports true – or at least, are they supported by external information?

Things that make this difficult include untranslated sections of text (you’ll need a native speaker or good auto translate software), codes (e.g. what does “41” mean as a message?) and the amount of time it takes to check through every report by hand. This can be hard work, but if you don’t do these things, you’re not doing due diligence on your data, and that can be desperately important.

Please open up as much social-good data as possible, but do it responsibly too. We’ve seen too many instances of datasets that should have been kept private making it into the public domain—as well as instances of datasets that should have become public, and datasets that have been carefully pruned being criticized for release.’

Are any of the processes or considerations Sara describes useful for your work? What other issues should people dealing with crowdsourced information be thinking about?


About the contributor

Tom started out writing and editing for newspapers, consultancies and think tanks on topics including politics and corruption in sub-Saharan Africa and Asia, then moved into designing and managing election-related projects in countries including Myanmar, Bangladesh, Rwanda and Bolivia. After getting interested in what data and technology could add in those areas and elsewhere, he made a beeline for The Engine Room. Tom is trying to read all of the Internet, but mostly spends his time picking out useful resources and trends for organisations using technology in their work.

See Tom's Articles

Leave a Reply

Related /

/ May 17, 2019

From Consensus, to Calls to Action: Insights and Challenges From #5daysofdata

/ May 17, 2018

Why accessibility matters for responsible data: resources & readings

/ January 24, 2018

RD 101: Responsible Data Principles