Responsible Data reflection stories: an overview
A structured knowledge base on the unforeseen challenges and (sometimes) negative consequences of using technology and data for advocacy.
ContextThrough the various Responsible Data Forum events over the past couple of years, we’ve heard many anecdotes of responsible data challenges faced by people or organizations. These include potentially harmful data management practices, situations where people have experienced gut feelings that there is potential for harm, or workarounds that people have created to avoid those situations. But we feel that trading in these “war stories” isn’t the most useful way for us to learn from these experiences as a community. Instead, we have worked with our communities to build a set of Reflection Stories: a structured knowledge base on the unforeseen challenges and (sometimes) negative consequences of using technology and data for social change. We were able to do this with the support of Hivos. We hope that this can offer opportunities for reflection and learning, as well as helping to develop innovative strategies for engaging with technology and data in new and responsible ways.
Finding reflection stories
In July 2015, we put out a call on the Responsible Data blog, looking for ‘Responsible Data Reflection Stories”, and disseminated it via Twitter and on our Responsible Data mailing list.
Recognising that sharing these stories was perhaps more likely to happen in a more trusted, face-to-face environment, extra promotion of the call for stories was put out on Twitter during events where we were physically present, like the Chaos Communications Congress, the Media Party in Buenos Aires, Argentina, and the Open Government Partnership Summit in Mexico.
Through these efforts, we received a number of email and Twitter tips of leads, and gathered a list of potential leads to follow up on, the large majority from people already within our network.
Going back to the source
Wherever possible, we spoke to at least one- and ideally more, people involved. In some cases, though, this wasn’t possible – namely, in Stories #4 and #5. In all other cases, we spoke to people involved – ideally from ‘both sides’ of the story, and they had a chance to review the piece and have input, prior to publishing.
In some cases, we encountered difficulties in reconciling very different perspectives of what happened; thinking about a project from a “responsible data” perspective does inherently lean towards the critical. Because the primary goal of these stories is to highlight responsible data challenges of specific cases, they do lean towards giving more space to critical perspectives of said cases, rather than praising them, as a result.
Some people we spoke to preferred to remain anonymous, and due to the unique nature of the work they do, this meant that we had to do more than just remove names – notably, in Story #3, we refer to the user group of an app as a “disproportionately criminalised population”, meaning they are an often-discriminated against section of society.
Our hypothesis when starting this project was that we, as a community, trade in “war stories” that might not be factually accurate. Through following up on the leads we received for this project, we realised that is perhaps an understatement.
Some of the leads for stories were known to us, and others were very well known within certain communities – often repeated as ‘fail stories’, or as warnings against certain projects.
Here’s one, which has been used in the past as an argument against publishing data on where international aid is delivered:
[box]“that time that an international aid agency published geolocated data on where their aid workers and facilities were in Pakistan, and the Taliban used this information to attack them directly”[/box]
From a quick online search, a few potentially related events come to light:
- the CIA using a vaccination campaign for hepatitis B to identify Osama bin Laden’s hideout
- the Taliban’s subsequent attacks on polio vaccination workers in June 2012, when according to National Geographic, Taliban leaders “banned all vaccination programs in the areas under their control” , after which, some international health agencies stopped work, and others continued but under police protection, or while removing their logos from their vehicles.
- Continued attacks, such as when Pakistani Taliban gunmen killed seven Pakistani aid workers in January 2013.
Through all of these related events, though, no relationship seems to be explicit between geolocated data being published and attacks on the aid workers. Of course, that does not rule out the possibility that it did happen – but without any online mention at all of it happening, it seems as though rumours may well have played a role in making this the cautionary tale that it is today.
This indicates that our hypothesis was correct, and cautions us to try and be sure of the source and veracity of a story before spreading it.
Our original focus was on challenges faced by the use of data in advocacy, but recognising sometimes blurry lines between the two, we have expanded that out to include advocacy and journalism. Data-driven journalism is on the increase, and as Nicolas Kayser-Bril notes, within journalism, “no systematic study of data-driven mistakes has been carried out by academia or professional organizations” – and while this is far from being a systematic study, we hope it can contribute to our collective intelligence about how just some of these mistakes take place, and the impact they have.
We received a number of tips about responsible data challenges faced by governments, in the private sector, and elsewhere, but for this set of stories at least, we decided to stick to our original mandate. The suggested possible stories, though, does indicate that there is a lot of scope for expansion of reflection stories in the future.
What we learned from the stories
New spaces, new challenges
Moving into new digital spaces is bringing new challenges, and social media is one such space where these challenges are proving very difficult to navigate. This seems to stem from a number of key points:
- organisations with low levels of technical literacy and experience in tech- or data-driven projects, deciding to engage suddenly with a certain tool or technology without realising what this entails. For some, this seems to stem from funders being more willing to support ‘innovative’ tech projects.
- organisations wishing to engage more with social media while not being aware of more nuanced understandings of public/private spaces online, and how different communities engage with social media. (see story #2)
unpredictability and different levels of visibility: due to how privacy settings on Twitter are currently set, visibility of users can be increased hugely by the actions of others – and once that happens, a user actually has very little agency to change or reverse that. Sadly, being more visible on, for example, Twitter disproportionately affects women and minority groups in a negative way – so while ‘signal boosting’ to raise someone’s profile might be well-meant, the consequences are hard to predict, and almost impossible to reverse manually. (see story #4)
- consent: related to the above point, “giving consent” can mean many different things when it comes to digital spaces, especially if the person in question has little experience or understanding of using the technology in question (see stories #4 and #5).
Grey areas of responsible data
In almost all of the cases we looked at, very few decisions were concretely “right” or “wrong”. There are many, many grey areas here, which need to be addressed on a case by case basis. In some cases, people involved really did think through their actions, and approached their problems thoughtfully and responsibly – but consequences they had not imagined, happened (see story #8).
Additionally, given the quickly moving nature of the space, challenges can arise that simply would not have been possible at the start.
Quickly moving “innovation”
The promise and benefits that technology and increased uses of data can bring, are widely touted. For some advocacy organisations without much experience of engaging with technology in their work, it appears that these benefits are much more clear than the potential risks – which in fact, are much harder to see with low levels of technical literacy.
This promise seems to bring with it an element of speed; using technology generally makes things faster, but thinking through responsible data concerns can slow things down in the short term, while making the projects more successful in the long term. As mentioned above, too – funders seem to be increasingly supporting ‘innovative’ uses of technology, in some cases without thinking through the responsible data concerns.
As we see in the Mitigation Strategies section below, the time taken to engage with responsible data practices does indeed pay off in the long term – (see story #6 and #3). Those with higher levels of technical literacy – that is, those who are aware of how the technology works, of privacy concerns, more well-versed in ethical debates around what they are doing – are the ones who are engaging more actively in mitigation strategies during or even before starting projects, rather than afterwards.
Collaboration between advocacy orgs and technical partners
Many of the organisations featured would not consider themselves to be particularly “technical”, but in order to develop their project, they partnered with a technical development company. For the most part, effective communication between these two very different types of partners seems to be difficult: they have differing priorities, and very different contextual understandings.
Problems have arisen when concerns of an advocacy organisation, ie. someone with good contextual understanding and experience of working with the community in question, seem to have been under-prioritised or misunderstood by a technical partner.
As outlined in the stories, there are lots of ways in which a project could take an unexpected turn. With the heavy caveat that there isn’t one solution to solve any of these problems, and that each of these need to be thought through in the context of the project itself, here are some of the strategies that people engaged in while trying to work through the challenges they faced.
In terms of community management, co-design can be understood as one way to get buy-in and ownership from the community. But building with a community also brings benefits in terms of understanding what responsible data challenges are actually being faced.
Hearing from the community who will be involved in the project (eg. the users of the tool, the people who will feature in the dataset) early on, and repeatedly, seems to vastly increase the likelihood that red flags will be noticed early on, and gives the project owners the chance to redirect course.
For grant-funded organisations, this depends in large part upon the flexibility and understanding of funders. Being able to change course halfway through a project, and thus redirecting funds from one planned use to another, requires either un-earmarked grant money (which many non-profits do not have) – or, funders who recognise the importance of flexibility to build effective projects.
Using in-person and online networks to gather information from experts is a good way to reach out to trusted people, without having to do anything as drastic as put job advertisements up, or write on an organisation’s blog. In many of these cases, this kind of networked advice seems to be offered for free, with the potential of turning into consulting further down the line. To get that first level of advice for free, being able to tap into a network (such as the Responsible Data community) for help, seems to really help.
In these cases, abstracting out the different types of expertise needed to address the challenge – for example, rather than asking for a digital security expert, specifying that an expert is needed to work on, for example, secure storage of images. Additionally, a good strategy seems to be getting multiple opinions from those with more expertise in these areas, rather than relying on one or two. It is likely that people will disagree about what the best course of action is, especially if they do not have the same level of contextual understanding- but gathering their opinions and advice will make it easier to make a well-informed decision.
using in-person and online networks to ask for help, and being open about the challenges that are being faced:
In cases where there have been challenges that were not mitigated against, offering unreserved apologies and owning that a mistake took place, is a good first action. Of course, it does not undo any harm that might have been done; but in those cases, getting support from the affected communities will be a necessary strategy in the broader mitigation plan.
Admitting that there are responsible data challenges shouldn’t be seen as a deterrent to the project getting support, but rather the opposite.
In a way, this strategy requires people with a solid understanding of what data can, and can’t do. Being able to openly admit that despite thorough analysis of a dataset, or deep expertise in a certain topic, there are multiple levels of uncertainty inherent within the project, is something that not everyone will feel comfortable doing. But in order for the data to genuinely inform positive social change, its limitations need to be explicitly made – not just internally within a team, but also externally, to the audience or community affected.
Despite the very varying settings of the stories collected, the shared mitigation strategies indicate that there are indeed a few key principles that can be kept in mind throughout the development of a new tech- or data-driven project.
The most stark of these – and one key aspect that is underlying many of these challenges – is a fundamental lack of technical literacy among advocacy organisations. This affects the way they interact with technical partners, the decisions they make around the project, the level to which they can have meaningful input, and more. Perhaps more crucially, it also affects the ability to know what to ask for help about – ie, to ‘know the unknowns’.
Building an organisation’s technical literacy might not mean being able to answer all technical questions in-house, but rather knowing what to ask and what to expect in an answer, from others. For advocacy organisations who don’t (yet) have this, it becomes all too easy to outsource not just the actual technical work but the contextual decisions too, which should be a collaborative process, benefiting from both sets of expertise.
There seems to be a lot of scope to expand this set of stories both in terms of collecting more from other advocacy organisations, and into other sectors, too. Ultimately, we hope that sharing our collective intelligence around lessons learned from responsible data challenges faced in projects, will contribute to a greater understanding for all of us.
Read all the stories here.
This project was supported by Hivos.