Department or Program

Computer Science

Primary Wellesley Thesis Advisor

Darakhshan Mir


A wider release of police datasets could enable social scientists, community activists, and civil libertarians to more effectively challenge discriminatory policing practices. However, the privacy implications of such sharing must be carefully considered. It is known that "de-identifying" data is not sufficient to protect the privacy of individuals.

New York City's stop-and-frisk data is an example of one such police dataset. The stop-and-frisk data contains information (including demographic information) about all people stopped by the program from 2003 to 2012. This paper examines the identifiability of this data. It examines the uniqueness of the data to investigate the privacy implications of its release for the individuals targeted by stop-and-frisk. It also suggests ways to re-identify this data.