Responsible Data Concerns with Open Source Intelligence


/ November 14, 2016
This is a guest post by Evanna Hu.

Open source intelligence (OSINT) is based on publicly available information, both offline and online. OSINT has proven itself to be extremely valuable in a wide range of industries, from business intelligence to investigative journalism and humanitarian relief.

Corporations such as Goldman Sachs use publicly sourced market and political intelligence to identify risks, while international NGOs protect their supply chains using intelligence about terrorist groups gathered from social media and messaging apps. Meanwhile, think tanks such as the Institute for the Study of War, which solely uses OSINT, report on incidents and shifts in armed groups’ allegiances at a level of detail that would give intelligence agencies a run for their money.

As technology proliferates and the volume of available data increases, organisations and individuals are starting to rely solely on OSINT rather than private and classified information. It’s not just that the information is cheap (or free). More importantly, it is already publicly available – which organisations often take to mean that using it does not violate individuals’ privacy rights. In these cases, people often assume that simply accessing OSINT doesn’t make them responsible for confirming that responsible data issues like consent have been addressed.

But does it?

Areas of concern

In reality, there are plenty more shades of grey. Whether it is using hacked sources and evidence in investigative journalism (especially when communications haven’t been authenticated or where there is no consent for them to be published), or whether it relates to questions about accountability versus advocacy in the human rights & criminal justice space – responsible data approaches should play a huge role in an organisation’s strategic thinking.

OSINT introduces five major ethical concerns, which are outlined below. As with many responsible data concerns, legal compliance is just one part of a much bigger picture, and it often forms the lowest bar rather than the best practice we should strive for. The fact that we are not prohibited from doing something does not mean that we should commit that act.

1. Origin and intent of sources

The origin and intent of the intelligence can bias the data sample and create mislead analysis. Was the publicly available information initially cherrypicked to fit a certain narrative while the rest was discarded? What if the original source of the information was initially classified but subsequently obtained through hearsay? This post on responsible data approaches to whistleblowing explores some of these issues in more detail. 

2. Unclassified but sensitive

The fact that information is unclassified does not mean that individuals or groups won’t get hurt if that information is publicised widely. How can we ask questions to reflect the concerns of those reflected in the data, especially if it is not possible to get back in touch with that particular group? What trade-offs need to be made?

3. The Mosaic Effect

Even when one dataset is de-identified, it is still possible to combine it with other datasets to re-identify an individual or group. This is known as the “Mosaic Effect.” Should we use these techniques to develop insights on individuals or groups? What if they are harmful to others? Are there best practices to follow, especially when the data involves particularly vulnerable groups?

4. Reliance on automated analysis

Many OSINT-related cases involve cleaning, organising and analysing deluges of raw data. Algorithms, workbenches and machine learning can speed up this process significantly. Yet no technology platform is infallible, and the resulting analysis could have harmful consequences if it is wrong. How far should one rely on these methods? What’s the balance between machine and human power? Are there common pitfalls to avoid, or ways to make explicit the decisions that have gone into analysing the data?

5. Publicity and visibility

Most public information (and analysis of this information) is rarely viewed. Do we have a responsibility to share and publicise some of this information more widely if it is in the public interest – even if doing so might harm individuals or groups? What are the questions to ask in these kinds of situations?

My team learned these ethical dilemmas the hard way. Already practitioners and experts in the field of countering violent extremism, we were no strangers to operating in grey areas. But these decisions got harder once we formed Omelas – which utilises manual and automated data collection and analysis to map out the radicalization process and networks of vulnerable individuals, so that they can receive the most effective intervention or therapy for deradicalisation.

We are painfully aware of the ramifications in case of mistaken online aliases, missed connections that could lead to violent acts, and the potential loss of trust if we are perceived to have been too cozy with law enforcement. Our ultimate beneficiaries are these vulnerable individuals – and respecting their rights, and recognising our own duty of care towards them is a keystone of responsible and ethical data usage.

So how can we maintain their trust while still making sure that no one gets hurt? How do we decide who we work with and when? How do we ensure that the rights of individuals reflected in our data are respected at all points of this process?

Call to Action: A Crowdsourced Framework

In an attempt to set out where we stand as an organisation, we came up with our internal ethical guidelines. Running for more than four pages single-spaced, the framework covered everything from our relationships with law enforcement in all major scenarios and where and how we store our data, to where we locate our servers in case of subpoenas. But we felt uneasy, as we came up with the framework based on our personal ethical values without much external guidance. It was almost as if we were “playing God”. We did not like that feeling.

There are responsible data frameworks for various sectors – but, so far, none for OSINT. There are also only sparse resources for individuals and organisations to consult. What kind of questions should we ask ourselves when writing our internal ethical guidelines? Are we forgetting to consider certain factors? What types of players do we need to factor into our decision-making? How do we stay up-to-date with the latest data privacy laws and regulations?

If you’re also interested in coming together to draft or discuss responsible data issues around OSINT and data that has been made available through whistleblowing, hacking or data breaches, get in touch with Evanna on evanna.y.h@gmail.com and Zara on zara@theengineroom.org [PGP key here] – or join the Responsible Data mailing list and share your thoughts there directly. We’d love to collaborate on producing a responsible data framework that reflects the needs and concerns of anyone working on these issues.

About the contributor

Zara is a researcher, writer and linguist who is interested in the intersection of power, culture and technology. She has worked in over twenty countries in the field of information accessibility and data use among civil society. She was the first employee at OpenOil, looking into open data in the extractive industries, then worked for Open Knowledge, working with School of Data on data literacy for journalists and civil society. She now works with communities and organisations to help understand how new uses of data can responsibly strengthen their work.

See Zara's Articles

Related /

/ December 15, 2022

Building an embodied approach to data – an interview with Anja Kovacs

/ October 11, 2022

Exploring fundamental flaws behind digital consent  – an interview with Georgia Bullen

/ June 1, 2022

Seven Essential Questions for Ethical War Crimes Documentation