Loading…
Back To Schedule
Friday, November 15 • 4:20pm - 4:50pm
GDPR Data Cleaner: Mutating Immutable Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Remember when data engineers and data scientists used to say things like: * “Log everything” * “Never throwaway data” * “All data is important” * “What is useless data today is tomorrow’s data of gold” And then that four letter acronym came into our vernacular…. *G-D-P-R* Now, you hear statements like this… * “Do we really need this data?” * “Is this data used at all?” * “What does the GDPR say about this type of data?” Another change that came with the GDPR is the right for a user to request the deletion of their personal data. This is a tricky proposition for those dealing with big data, since all big data technologies were based on the concept of immutable data. Big data systems, such as Hadoop and Spark, scaled so well because there were no updates of data, instead only appends, and the data was written out in large blocks, not conducive to small updates/deletes. In this talk, we discuss how personal data can be cleansed from existing big data storage systems, such as columnar-oriented Hive tables and key-value stores, and we will introduce a new open source project that implements these ideas.

Speakers
avatar for David Winters

David Winters

Big Data Architect, GoPro
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big... Read More →


Friday November 15, 2019 4:20pm - 4:50pm PST
functional