Loading…
Friday, November 15 • 3:40pm - 4:10pm
Reliable, High Scale Tensorflow Inference Pipelines at Twitter

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Twitter heavily relies on Scala/JVM and has deep expertise in this area. For instance, we’ve built Finagle for low latency client / server RPCs, Heron for near real time data processing and Scalding for offline use cases (Hadoop / Spark). In comparison, the ML world is focused on the Python / C++ stack.

To provide a reliable Tensorflow inference offering for the different use cases at Twitter, we’ve had to overcome multiple problems to make our offering reliable, cost effective and scalable to large models. In this presentation, we’ll present our key learnings.

We’ll do a deep dive into specific performance issues that we’ve had to deal with and show you how we’ve handled them and built the tools and techniques to mitigate both issues we observe as well quality gates to prevent issues in the future.. We’ll also have a particular emphasis on observability, catching performance issues early through automatic performance regression analysis on key metrics (CPU usage, memory usage, latency, throughput). We’ll also talk about caring what you should optimize for (throughput VS latency for instance) and thinking early about your performance goals and Service Level Objectives before working on a new model.

All of these aspects helped us serve successfully 50+ different models in production, serving 20M to 40M+ requests per second.

At the end of this talk, we hope that you will understand better the choices Twitter made along the way to create a reliable JVM based inference Pipeline and that you will be able to benefit from our experience.



Speakers
avatar for Briac Marcatté

Briac Marcatté

Staff ML Engineer, Twitter
avatar for Shajan Dasan

Shajan Dasan

Staff ML Engineer, Twitter
Staff Machine Learning Engineer at Twitter.Working on Distributed Systems for the last 15 years.


Friday November 15, 2019 3:40pm - 4:10pm PST
data