Loading…
Thursday, November 14 • 9:40am - 10:10am
Finding Needles In Big Data Haystacks using Finite State Machine

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Working with data often means trying to locate data that fits patterns, akin to "Finding a needle in a haystack". When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process. This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson's 1968 CACM paper, was to create a Non Deterministic Finite Automaton around patterns defined using regex. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike. In this talk, we will cover how we built this framework (dubbed "Conduit") and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.

Speakers
avatar for Rashmi Shamprasad

Rashmi Shamprasad

Senior Data Engineer, Netflix
Passionate about all things data, Rashmi Shamprasad is a Senior Data Engineer on the Growth Data Engineering team at Netflix, building data products that enable Non Member Acquisition & Experimentation. With over 9 years of experience working in Big Data, her previous stints include... Read More →
avatar for Ajit Koti

Ajit Koti

Senior Engineer, Netflix
Ajit Koti is a Senior Engineer on the Growth Data Engineering team at Netflix, building and architecting large-scale distributed systems and real-time data processing engines.Ajit has worked previously at Fanatics, IBM Labs, J P Morgan and has extensive experience in building distributed... Read More →


Thursday November 14, 2019 9:40am - 10:10am PST
reactive