Protecting sensitive data in huge datasets: Cloud tools you can use

06/11/2018 - 16:30 to 17:10
long talk (40 min)

Session abstract: 

Before releasing a public dataset, practitioners need to thread the balance between utility and protection of individuals. In this talk we'll move from theory to real-life while handling massive public datasets. We'll showcase newly available tools that help with PII detection, and bring concepts like k-anonymity and l-diversity to a practical realm.

Related research: "Considerations for Sensitive Data within Machine Learning Datasets" -