Interested in how to publish data without passing on people’s personal information? Try the Open Data Institute’s walkthrough guide, developed from a session at the Open Knowledge Festival last year. It allows you to download passenger data from the Titanic and test out a few practical ways in which the data might be used.
A key trade-off is protecting data subjects’ privacy while making sure that the data is still useful. Even if you remove direct identifiers (the passengers’ names), there are still other ways in which the data could be used to identify a person.
The most common variables that are not direct identifiers but carry a high risk:
dates (e.g. birth, admission, discharge, …)
geolocators (e.g. post codes, spatial data)
age
unusual education (e.g. PhD in statistical disclosure control procedures)
unusual occupation (e.g. organiser of the OK Festival).
The guide has a series of other practical examples to think about.