De-anonymizing D4D Datasets

Paper by Kumar Sharad and George Danezis at 2013 Privacy Enhancing Technologies Symposium

Recent research on de-anonymizing datasets of anonymized personal records has not deterred organizations from releasing personal
data, often with ingenuous attempts at defeating de-anonymization. Studying such techniques provides scientific evidence as to why anonymization of high dimensional databases is hard and throws light on what kinds of techniques to avoid. We study how to de-anonymize datasets released as a part of Data for Development (D4D) challenge [12]. We show that the anonymization strategy used is weak and allows an attacker to re-identify and link records efficiently, we also suggest some measures to make such attacks harder.