Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, November 13
 

9:00am

Portable End-to-End Serverless Workshop
Speakers: James Ward, Ryan Knight, and more leading experts from Google, Lightbend, Capital One, and more!

In this workshop you will setup & deploy a serverless application that includes event functions, services, data streaming, and machine learning.  For event functions and services we will use the open source Knative project on Kubernetes.  For data streaming we will use Apache Kafka.  And for machine learning we will use Kubeflow.  All of these pieces will be weaved together into a cohesive application.  You will be able to run everything on your own machine or on the cloud and we will use ZIO with Scala or you can choose Java or Kotlin with Micronaut.
​About the bespoke Scale By the Bay workshops: every year we produce one special workshop with industry leaders in an important area of software engineering.  In 2015, it was the original SMACK, complete end-to0end data pipelines workshop.  In 2017, it was the Istio worlshop with James Ward, Ryan Knight, Max Klein, and the Google Istio team.  In 2018, Cliff Click, the creator of the JVM HotSpot, taught us about JVM performance and lifting compiler performance technique onto a big data cluster.  And this year, James and Ryan return to explain serverless thoroughly and hands on.  James is now at GCP where cloud workloads of the future are being built.  Ryan is implementing Lightbend stack at Capital One and the workshop will be taught from the vast experience.  In one day, you'll come home with a full serverless backend!


Speakers
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Wednesday November 13, 2019 9:00am - 5:00pm
 
Thursday, November 14
 

8:00am

8:45am

Grand Welcome and Opening Remarks
Welcome and orientation.

Thursday November 14, 2019 8:45am - 9:00am
functional

9:00am

The Times Are A-Changin'
Speakers
avatar for Heather Miller

Heather Miller

Assistant Professor, Carnegie Mellon University


Thursday November 14, 2019 9:00am - 9:30am
functional

9:40am

Enabling real time querying of Data using Apache Druid, Flink and Kafka
In this talk, we'll learn more about how Apache Druid powers alerting against real time data at Lyft, which is useful for several use cases including validating A/B tests, accuracy of emails sent out to customers and for internal tools. We'll talk about the challenges we faced while setting up our real time ingestion pipeline into Druid using Apache Flink and Kafka, and how we went about solving them.

Speakers
avatar for Sharanya Santhanam

Sharanya Santhanam

Software Engineer, Lyft
Im a Software Engineer @ Lyft working in the Data Platform Infrastructure team. I work on Interactive Query Engines. Interested to chat about Druid & Presto.
avatar for Shiv Toolsidass

Shiv Toolsidass

Software Engineering - Data Infrastructure, Lyft


Thursday November 14, 2019 9:40am - 10:10am
data

9:40am

Functional Electromagnetism
Strengthen your understanding of functional programming by looking at it from a fresh and unconventional perspective.

In this talk, we use GNU Radio to examine digital signal processing systems, and explore how we can use our understanding of functional programming to reason about unfamiliar systems such as software-defined radio by looking through the lens of category theory.

This talk was inspired by the paper "Categories for the Working Hardware Designer" by Mary Sheeran. Where Sheeran used category theory to derive theorems about a hardware description language, we will use it to reason about DSP systems.

Speakers
avatar for James Earl Douglas

James Earl Douglas

Software Engineering Consultant
Functional programmer, mountain biker, husband, and dad. Occasionally posts computerey things at https://earldouglas.com


Thursday November 14, 2019 9:40am - 10:10am
functional

9:40am

Finding Needles In Big Data Haystacks using Finite State Machine
Working with data often means trying to locate data that fits patterns, akin to "Finding a needle in a haystack". When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process. This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson's 1968 CACM paper, was to create a Non Deterministic Finite Automaton around patterns defined using regex. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike. In this talk, we will cover how we built this framework (dubbed "Conduit") and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.

Speakers
avatar for Rashmi Shamprasad

Rashmi Shamprasad

Senior Data Engineer, Netflix
Passionate about all things data, Rashmi Shamprasad is a Senior Data Engineer on the Growth Data Engineering team at Netflix, building data products that enable Non Member Acquisition & Experimentation. With over 9 years of experience working in Big Data, her previous stints include... Read More →
avatar for Ajit Koti

Ajit Koti

Senior Engineer, Netflix
Ajit Koti is a Senior Engineer on the Growth Data Engineering team at Netflix, building and architecting large-scale distributed systems and real-time data processing engines. Ajit has worked previously at Fanatics, IBM Labs, J P Morgan and has extensive experience in building distributed... Read More →


Thursday November 14, 2019 9:40am - 10:10am
reactive

10:20am

How to Eliminate Surprises In Your Data
How do you know you can trust the accuracy of the data flowing through a pipeline, and the insights derived from it? At Spotify, we have an infrastructure team focused on data quality to address this problem. From the cultural changes we’re making to give data engineers a quality mindset, to the specific tools we’ve written, we’ll explain how we increase confidence and eliminate surprises in our data contents, and how we approach problems in the wide space of ‘data quality.’ You’ll learn about a few key moments in the pipeline lifecycle when data quality might be compromised, and the approach we took to improving them.

Speakers
avatar for Anne DeCusatis

Anne DeCusatis

Data Infrastructure Engineer, Spotify
talk to me about data quality!My pronouns are they/them/theirs.
avatar for Idrees Khan

Idrees Khan

Senior Data Engineer, Spotify


Thursday November 14, 2019 10:20am - 10:50am
data

10:20am

Thank you, next: Iterators
Iterators are a powerful abstraction in programming languages, that abstract away complex structures and operations. The pattern is used throughout big data and the Scala collection library. In this session, we will dive deep into iterators, and ways to use them in your codebase.

Speakers
avatar for Umayah Abdennabi

Umayah Abdennabi

Software Engineer, Grammarly
Umayah is a software engineer on the data team at Grammarly where he works on an internal data analytics platform.


Thursday November 14, 2019 10:20am - 10:50am
functional

10:20am

Run Like a Boss in Cloud: How Istio and Kubernetes are Changing the Microservices Completely
With cloud-native architectures, we face challenges of distributed systems in terms of integration, failures, discovery, and monitoring. Istio and Kubernetes together meet these challenges, by providing an additional layer between services and the network, enabling you to control orchestration outside code. This revolutionizes the way services are connected, managed, and secured in cloud-native architectures. Through a series of quick demos and java code snippets, this session showcases how you can start utilizing Istio on Kubernetes for your own Java-based microservice architecture. in addition to this, we will try to showcase other CNCF fleet tools ( Kiali , Jaeger, ServiceGraph etc) for implementing the better microservice architecture.

Speakers
avatar for Muktesh Mishra

Muktesh Mishra

Senior Software Engineer, Adobe Inc
Muktesh is currently working as a Senior Software Engineer for Adobe Sensei Platform. He is an open-source contributor to 20+ projects and enjoys programming in polyglot. Primarily he is more interested and contributes in Microservices, Cloud, Containerization, Architectures and distributed... Read More →


Thursday November 14, 2019 10:20am - 10:50am
reactive

11:00am

From datasets to tables in a multitenant data lake
Salesforce Einstein democratize access to world class machine learning in the Salesforce ecosystem by making it easier to build trusted, scalable, and efficient ML powered apps. A major effort required is to make tenant data available to those ML processes. This talks will cover our journey to change the major abstraction offered by the data micro-services in the Einstein platform, moving from the dataset to the table. In particular, why we think that new abstraction is more useful for consumer of the service, and the technology choices we have made.

Speakers
avatar for Thomas Gerber

Thomas Gerber

Director of Engineering, Salesforce


Thursday November 14, 2019 11:00am - 11:30am
data

11:00am

Hacking F# in JS ecosystem
Javascript has conquered the world - developers can use it in the browser, on the server, to write mobile apps, on the desktop with Electron, and even to create serverless services. Like the language or not, the truth is JS developers have built an incredible ecosystem with libraries and tools to do almost anything. During the talk I'll show how to bring the power of F# - the functional paradigm, static typing with type inference, pattern matching, and more modern language features - to this huge and rich JS world using Fable - F# to JS compiler. Fable doesn't add any runtime overhead and generates clean JS code in conformance with new ES6 patterns, like modules or iterables, making it compatible with modern development tools, including Github Electron or React Native to let you develop not only web but also cross-platform desktop and mobile apps. I'll demonstrate how to create different types of JS applications using F# - from React-based frontend application, through the mobile app using React Native to serverless services with amazing webtask.io

Speakers
avatar for Krzysztof Cieslak

Krzysztof Cieslak

CEO & Open Source Developer, Lambda Factory
Chris is software developer, consultant, founder of [Lambda Factory](http://lambdafactory.io). He's author of [Ionide](http://ionide.io/), [Forge](http://forge.run), [Fornax](https://gitlab.com/Krzysztof-Cieslak/Fornax), project owner and maintainer of [VSCode-Elm](https://marketplace.visualstudio.com/items?itemName=sbrink.elm... Read More →


Thursday November 14, 2019 11:00am - 11:30am
functional

11:00am

Next-Level Diagnostics for Async & Concurrent Errors with ZIO
A strength of the Scala programming language is its powerful support for asynchronous and concurrent programming—historically with Future and Akka, and today, with next-generation effect systems like ZIO.

While ZIO features like fiber-based concurrency, software transactional memory, and async/concurrent resource safety may grab headlines, in everyday programming, we spend a lot of our time debugging our async/concurrent code.

In this presentation by John A. De Goes and Salar Rahmanian, you’ll see how newly-developed features in ZIO make it easier than ever to troubleshoot problems in modern applications.

You’ll discover how execution traces show exactly the line-by-line flow of your async/concurrent code (including where it would continue to if it did not error!), and how interactive debugging features let you identify and troubleshoot stalled async code.

Discover just how powerful async and concurrent programming has become in Scala!

Speakers
avatar for Salar Rahmanian

Salar Rahmanian

Software Developer, Mya Systems
I have been developing software since the age of eleven and have over 20 years of commercial experience. My passion and expertise is focused on functional programming and building concurrent and distributed systems using Scala. I am a core developer for the ZIO Scala Library for asynchronous... Read More →
avatar for John A. De Goes

John A. De Goes

Solution Architect, De Goes Consulting
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →


Thursday November 14, 2019 11:00am - 11:30am
reactive

11:40am

Vectorized Query Processing for CPUs and GPUs using Apache Arrow
Query processing technology has seen rapid development since the iconic C-Store paper was published in 2005. The focus has been on designing query processing algorithms and data structures that efficiently utilize CPU and leverage the changing trends in hardware to deliver optimal performance. In this talk we will explore different types of vectorized query processing in Dremio using Apache Arrow. Abstract Columnar data has become the de facto format for building high performance query engines that run analytical workloads. Apache Arrow is an in-memory columnar data format that houses canonical in-memory representations for both flat and nested data structures. It is a natural complement to on-disk formats like Apache Parquet and Apache ORC. Data stored in a columnar format is amenable to processing using vectorized instructions (SIMD) available on all modern architectures. Query processing algorithms can implement simple and efficient code that operates on the columnar values in a tight-loop, providing fast and CPU cache-friendly access patterns. Operations like SUM, FILTER, COUNT, MIN, MAX etc on columnar data can be made more efficient by leveraging the data-level parallelism property of SIMD instructions. Columnar data can be encoded using lightweight algorithms like dictionary encoding, run length encoding, bit packing and delta encoding that are far more CPU efficient than general purpose compression algorithms like LZO and ZLIB. Furthermore, vectorized query processing algorithms can be written in a manner that are aware of column level encoding and can easily operate on the compressed column values in some cases. This saves CPU-memory bandwidth since we need only decompress the necessary column values. Columnar format allows us to efficiently utilize CPU and GPU cache by filling cache lines with related data (column values from an in-memory vector). With the increasing use of GPUs and FPGAs, efficient use of the smaller on-chip memory available in these architectures is especially important. In addition, Apache Arrow allows for zero-copy, shared access to buffers so that multiple processes can more efficiently operate on the same data. On the storage side, columnar representation of on-disk data makes a good case for efficient utilization of disk I/O bandwidth for analytical queries. Dremio’s query processing engine leverages columnar format of Apache Arrow and Parquet for in-memory and on-disk representations respectively. We have vectorized implementations of operators like hash join and hash aggregation to name a few.

Speakers
avatar for Jacques Nadeau

Jacques Nadeau

CTO & Co-founder, Dremio


Thursday November 14, 2019 11:40am - 12:10pm
data

11:40am

Recursion schemes with Higherkindness
Recursive structures appear in many problems, from databases to machine learning, and writing functions to operate over them is not always a simple task. As functional programming tries to abstract as many things as possible, it offers a way to decouple a recursion from the implementation of business rules. In this session, Andy and Oli will guide you through recursion schemes fundamentals and Droste, a recursion library for Scala. Along the way, we will explore how it can be utilized in practice, including examples of its usage in Skeuomorph, a library for transforming data protocols.

Speakers
avatar for Oli Makhasoeva

Oli Makhasoeva

Solutions Architect, 47 Degrees
I'm hosting lovely podcasts about Scala
avatar for Andy Scott

Andy Scott

Person, Stripe


Thursday November 14, 2019 11:40am - 12:10pm
functional

11:40am

Integrating Developer Experiences - Build Server Protocol and beyond
IDEs - Integrated Development Environments - traditionally provide out of the box support for many of the tasks that go into making working software out of source code. But increasingly, developers expect to be able to use any one of a variety of special-purpose tools for each task. This shifts the focus of the IDE from "Integrated" to "Integrating" external tools into a coherent experience. Especially in the Scala ecosystem, we have an increasing number of build tools to choose from. I have been focusing on integrating sbt and other new tools with the IntelliJ Scala plugin and will talk about challenges involved and how the Build Server Protocol makes it possible for IntelliJ to interface with any build tool.

Speakers
avatar for Justin Kaeser

Justin Kaeser

Software Developer, JetBrains


Thursday November 14, 2019 11:40am - 12:10pm
reactive

12:10pm

Lunch
Thursday November 14, 2019 12:10pm - 1:00pm
commons

1:00pm

Was He Wright All Along? Software After Moore's Law
Moore's Law is indisputably ending -- but what was it even in the first place?  In particular, can the phenomenon we think of as Moore's Law actually be better explained by Theodore Wright in a 1936 paper on aircraft economics?  If so, could Wright's Law continue to apply as Moore's Law ends?  More generally, what are the ramifications for software as silicon-based microprocessors reach their physical limitations?  In this talk, we will explore the end of Moore's Law, the prospects for Wright's Law in microprocessors, and what it all means for those of us who build software systems.

Speakers

Thursday November 14, 2019 1:00pm - 1:30pm
data

1:00pm

Solving the Scala Notebook Experience
Notebooks have become an essential tool for data science and machine learning research. We felt the existing tools weren't satisfactory for Scala, and lacked the features that developers need in order to productively create reproducible notebooks. So we set out to create a new notebook tool from scratch, which provides essential code editing features that other tools lack, as well as seamless interoperability between multiple languages – including Scala, Python, and SQL – and a host of other improvements that evolve the notebook experience. We'll demonstrate our open-source notebook solution, and talk about why and how we built it – and some of the great Scala ecosystem libraries that allowed us to go from zero to MVP in an unbelievably short time.

Speakers
avatar for Jeremy Smith

Jeremy Smith

Sr. Software Engineer, Netflix
JI

Jonathan Indig

Sr. Software Engineer, Netflix


Thursday November 14, 2019 1:00pm - 1:30pm
functional

1:00pm

High Performance Serverless Functions in Scala
I'll show you how to easily build serverless functions in Scala, including AWS Lambda, that beat back "cold start" issues with extreme low response times

Speakers
avatar for Jason A Swartz

Jason A Swartz

Edge EM, Twitch


Thursday November 14, 2019 1:00pm - 1:30pm
reactive

1:40pm

Scaling Financial Automation on TypeBus
Financial systems are known to move at glacial speeds making it tricky to build innovative systems in a world where everyone wants to access their data in real-time. Tally provides financial automation to our customers in an innovative way using TypeBus, a framework for building distributed microservices in Scala using Akka Streams and Kafka. TypeBus allows the ability to run various asynchronous tasks, yet remaining available for customers to access their data in real-time. 

There are many libraries out there focused on delivering low-latency responses. There are less which aim to be transparent to the user, provide auditability with baked in retries and back pressure, and can be easily distributed across a cluster. In this session, we'll discuss problems Tally faced building micro services in the past and why we moved to TypeBus.




Speakers
avatar for Tabitha Blagdon

Tabitha Blagdon

Engineering Manager, Tally
avatar for Kaoru Kohashigawa

Kaoru Kohashigawa

Senior Platform Engineer, Tally


Thursday November 14, 2019 1:40pm - 2:10pm
data

1:40pm

A brief introduction to systems programming, with Scala Native
With Scala Native's new unsafe API, Scala programmers have access to just as much power as C programmers have had for 50 years. But what does systems programming even look like in a modern language, with Scala's immensely expressive type system? We'll find out as we explore the fundamental concepts of systems programming: pointers, structs, arrays, and strings. As we proceed, we'll see how Scala can provide safer and more ergonomic patterns than C, and compare Scala Native's capabilities to languages like Rust and OCaml. And finally, we'll look at the ways hardware is changing, and the role systems programming (and Scala) can play in defining the patterns and architectures of the future.

Speakers
avatar for Richard Whaling

Richard Whaling

Lead Data Engineer, M1 Finance


Thursday November 14, 2019 1:40pm - 2:10pm
functional

1:40pm

Growing the Scala Community
The Scala community has grown significantly over the past 15 years.  As a community, we wrote millions of lines of code and developed hundreds of projects. While the language is thriving, there is still room to contribute to the community. Different from other tech talks, this talk focuses on contributing to the diversity aspect of the community. It explains the significance and benefits of diversity, and it proposes solutions to diversify and improve the community. One of the best ways to grow the community and to bring diversity into the community is to organize ScalaBridge workshops, which are intended to provide resources for people from underrepresented populations to learn Scala. (Diversity comes in many forms: race, gender, age, religion, culture, sexual orientation, socioeconomic background, etc.) While the workshops have positive and lasting impacts, it cannot be done by one individual or by a single organization. In order for the Scala community to become more diverse, we need your help to scale up! Attend this talk to learn about how to contribute to our community!

Speakers
avatar for Yifan Xing

Yifan Xing

Software Developer
Yifan is a software engineer, ScalaBridge organizer, and open-source contributor. Her work involves many distributed systems related topics, including network protocols, consensus, network security, etc. Yifan contributed to the message queue systems and asynchronous APIs for a Scala... Read More →


Thursday November 14, 2019 1:40pm - 2:10pm
reactive

2:20pm

End-to-End ML Pipelines with KubeFlow and TensorFlow Extended (TFX)
Title Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + Airflow + Jupyter Description In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. XGBoost results on the pipelines UI Airflow is the most-widely used pipeline orchestration framework in machine learning. Pre-requisites Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop Agenda 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras, and TensorFlow 2.0 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow Fairing 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow and Katib 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace Key Takeaways Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools. Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extend ed (TFX) + Kubernetes + Airflow + Jupyter In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.

Speakers
avatar for Chris Fregly

Chris Fregly

Founder, PipelineAI
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup... Read More →


Thursday November 14, 2019 2:20pm - 2:50pm
data

2:20pm

Quill + Doobie = Better Together
The power of today’s Open Source libraries is integration, and the Cats ecosystem is a great case-in-point. Combining Http4s, fs2, and Doobie makes for a powerful recipe that is ridiculously easy to use. As of Doobie 0.7.0, Quill has joined the fray, bridging the gap to the database layer. You can now model your data layer, compose it into queries, transact it, and send it out to the world, without ever having to leave the functional paradigm. By creating this integration, both Doobie and Quill get more than the sum of their parts. Doobie for instance, leverages fs2 to get rich set of effect-level JDBC operations that support everything from parallel-streaming queries, to reactive monitoring of asynchronous events from database change-listeners. Typically, the SQL queries for these varying use-cases will be similar but also different in non-trivial ways. This is where Quill comes to the rescue, allowing the common parts of SQL-queries to be abstracted away from the uncommon parts, ensuring a very DRY and maintainable solution, especially when the queries get very, very big. Join us on the next step of this wonderful journey!

Speakers
avatar for Alexander Ioffe

Alexander Ioffe

Senior Scala Evangelist, Nasdaq
avatar for Rob Norris

Rob Norris

Programmer, Gemini Observatory
Software Engineer


Thursday November 14, 2019 2:20pm - 2:50pm
functional

2:20pm

Everything old is new: today's infrastructure as yesterday's Internet
In the age of cloud computing, what was once boring infrastructure has become incredibly exciting. From containers to service discovery to cluster schedulers, both industry and academia have been innovating at an alarming pace. With so many systems rapidly evolving in a domain fraught with trade-offs, it has become difficult to see the proverbial forest for the trees. In this talk we will try to see how we can evaluate this new landscape of systems by exploring classic networking papers and seeing what the design principles of yesterday's Internet have to say about the design decisions of today's infrastructure.

Speakers
avatar for Adelbert Chang

Adelbert Chang

Lead AI Engineer, Target
Adelbert Chang is an engineer at Target where he works on deployment infrastructure for the AI Engineering team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems, rule engines, and developer... Read More →


Thursday November 14, 2019 2:20pm - 2:50pm
reactive

3:00pm

Apache Flink 2.0: Unified Enterprise Data Processing System and Beyond
As the most popular and widely adopted stream processing framework, Apache Flink powers some of the world's largest stream processing use cases in companies like Netflix, Alibaba, Uber, Lyft, Pinterest, Yelp , etc.

In this talk, we will first go over use cases and basic (yet hard to achieve!) requirements of stream processing, and how Flink stands out with some of its unique core building blocks, like pipelined execution, native event time support, state support, and fault tolerance.

We will then take a look at how Flink is going beyond stream processing into areas like unified streaming/batch data processing, enterprise intergration with Hive, AI/machine learning, and serverless computation, how Flink fits with its distinct value, and what development is going on in Flink community to gap.

Speakers
avatar for Bowen Li

Bowen Li

Senior Software Engineer, Alibaba
Bowen is a committer of Flink and senior engineer at Alibaba. Bowen frequently give talks of Flink at conferences, and organizes Flink meetups and events in Seattle.


Thursday November 14, 2019 3:00pm - 3:30pm
data

3:00pm

Speedy Scala Builds at Databricks
Building Scala code in general can be really slow. To speed this up, Databricks' Developer Tools team has taken on a variety of projects to attack the problem from different angles - from JVM tuning to cloud infrastructure - resulting in build times that are significantly less infuriating. This talk will walk you through the details of each project, and attach concrete numbers to exactly how much of a difference each one made in this year-long effort.

Speakers
avatar for Li Haoyi

Li Haoyi

Software Engineer, Databricks
avatar for Ahir Reddy

Ahir Reddy

Software Engineer, Databricks


Thursday November 14, 2019 3:00pm - 3:30pm
functional

3:00pm

To Spark or Not to Spark
Heard about the exciting new world of distributed Analytics with Spark but not sure if it's appropriate for your use case? In this talk, we'll walk through the basic use cases for Spark with distributed databases like Apache Cassandra. We'll outline the potential uses for any organization, even those not requiring generic analytics capabilities. Learn about how we can use Spark to load data, modify tables, and move data from cluster to cluster. Discover more advanced use cases, like working with streaming services and messaging queues. Find out about all the exciting things you can do with Spark and when you may be able to get away without it!

Speakers
avatar for Russell Spitzer

Russell Spitzer

Software Engineer, DataStax
Spark, Cassandra, or Dogs.


Thursday November 14, 2019 3:00pm - 3:30pm
reactive

3:40pm

Swift for TensorFlow: Machine Learning with No Boundaries
Swift for TensorFlow is a platform for the next generation of machine learning that leverages innovations like first-class differentiable programming to seamlessly integrate deep neural networks with traditional software development. In this session, learn how Swift for TensorFlow can make advanced machine learning research easier and why Jeremy Howard’s fast.ai has chosen it for the latest iteration of their deep learning course.

Speakers
avatar for Paige Bailey

Paige Bailey

Developer Advocate (TensorFlow), Google


Thursday November 14, 2019 3:40pm - 4:10pm
data

3:40pm

Scoring ONNX ML Models with Scala
ONNX is an emerging standard format for serializing machine learning models. This talk will introduce Agate, Stripe's library for scoring ONNX models on the JVM in pure Scala. Stripe uses Agate to score deep learning models in batch in spark and scalding, and also in real-time using our scala-based scoring service. We'll talk about performance, how Agate was developed, and how we use graal native-image built binaries to interop with python.

Speakers
avatar for Oscar Boykin

Oscar Boykin

Machine Learning Infrastructure, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.


Thursday November 14, 2019 3:40pm - 4:10pm
functional

3:40pm

Serverless Scala - Functions as SuperDuperMicroServices
Serverless is all the rage but what does it mean for Scala developers? Can we take a plain ol' Scala function and run it on the cloud with infinite scalability? This talk will explore how to build and deploy serverless Scala and how to avoid startup overhead. We will also explore how to build pure serverless functions to make programs more provably correct, easier to build, test, and run. We will use Google Cloud as a reference serverless implementation but the concepts are applicable with any provider.

Speakers
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Josh Suereth

Josh Suereth

Engineer, Google


Thursday November 14, 2019 3:40pm - 4:10pm
reactive

4:20pm

Hack Weekend: ML models on mobile
ML models are increasingly deployed on phones, but what does it actually take to go from a state of the art model in Python to running that model on a phone? Spoiler alert: a lot. Erik recaps his weekend of attempting to go from 0 mobile development or on device model experience to running GPT-2 in an iOS app, covering the problems he encountered using Core ML/Onnx/TFLite, how to solve those problems, and why Swift for TensorFlow has the potential to change everything.

Speakers
avatar for Erik Reppel

Erik Reppel

ML Platform Engineer, Coinbase
Erik Reppel is an engineer on the Machine Learning and Platform team at Coinbase where he primarily works on improving the quality of ML tooling and deploying ML models at scale.


Thursday November 14, 2019 4:20pm - 4:50pm
data

4:20pm

Re-programming the programmer, from Actors to FP
Over the last few years I have built a DNS management system. Initially started as an Event Sourcing application built in Akka, the system had to be re-architected multiple times to address unforeseen issues stemming from new requirements, operational issues, and developer pitfalls (mistakes). This talk will introduce concepts in the DNS domain and different architecture styles including Event Sourcing in Akka and Stream processing in FS2. The talk will describe the journey from inception through to the current system design, highlighting the key challenges encountered along the way and the evolution of the design to account for those challenges. I plan on using real code to demonstrate each architecture along the journey.

Speakers
avatar for Paul Cleary

Paul Cleary

Senior Principal Engineer, Comcast
20+ years of software development experience, spent most of the last 5 years in Scala. Most of my career is building OO systems, recently converted to FP. After all this time I am still learning. Talk to me if you are struggling with Scala or Functional Programming or if you are... Read More →


Thursday November 14, 2019 4:20pm - 4:50pm
functional

4:20pm

Deploy end to end ML pipeline using Apache spark streaming and kubernetes.
Deploy an end to end ML pipeline using Apache spark streaming and kubernetes. Distributed streaming processing engines, like Apache Spark(TM) Structured streaming can help in various ways for performing machine learning in real time at a large scale. A typical streaming machine learning end to end pipeline consists of : # Preprocessing the data based on the application. e.g. normalising or cleaning etc.. # Using micro service and kubernetes hosting the model, using IBM MAX (IBM Model asset exchange). # Scaling the entire pipeline using Apache Spark and kubernetes. This talk may consist of a live demo of applying the above technique, for predicting objects in an image, using an object detection model. Since this is a streaming application, the prediction will be made in realtime. Key takeaways: # Learn about reusing ML models using IBM Model asset exchange. # Learn about how to scale an online ML application end to end, using Apache Spark Structured streaming and kubernetes. Details of associated code and data source used for the demo available here: https://github.com/ScrapCodes/SS-on-kube

Speakers
avatar for Prashant Sharma

Prashant Sharma

System Software Engineer, IBM
avatar for Nick Pentreath

Nick Pentreath

Principal Engineer - Center for Open Source Data & AI Technologies (CODAIT), IBM
Nick Pentreath is a principal engineer in IBM's Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match... Read More →


Thursday November 14, 2019 4:20pm - 4:50pm
reactive

5:00pm

Machine Learning's Missed Opportunity in Visual Data Management
ApertureData's platform accelerates AI applications through its Data Management solution that redefines how large visual data sets are stored, searched and processed. It exposes a unified interface that allows users to store and search both the data and metadata associated with visual artifacts (images or videos). ApertureData's platform provides several innovative features: the ability to evolve metadata easily without requiring costly schema change, first-class status for feature vectors and bounding boxes, the ability to perform similarity searches as well as the ability to perform common pre-processing operations close to the data. The platform will be pluggable in allowing data to be stored on different backends and serve any machine learning pipeline. Based on our current work with customers, our platform, when used for a medical imaging use case, provides up to 5X improvement over the range of queries executed commonly in the field and can save upwards of 2 months per data scientist per machine learning deployment for every new application that wants to exploit data to gather insights. What other makeshift solutions fail to address is that once AI is ready to be commercialized, managing the onslaught of real visual data is going to be a killer for real deployments. Our talk will explain how ApertureData Platform achieves the performance and functionality for a wide range of application domains as well as a demo to show how to use it.

Speakers
avatar for Vishakha Gupta-Cledat

Vishakha Gupta-Cledat

Founder and CEO, ApertureData
I am the Founder and CEO of ApertureData. Prior to that, I was at Intel Labs for over 7 years where I led the design and development of VDMS (the Visual Data Management System) which forms the core of the ApertureData Platform. I have a Ph.D in Computer Science from the Georgia Institute... Read More →


Thursday November 14, 2019 5:00pm - 5:30pm
data

5:00pm

Metals - building rich IDE features beyond the Language Server Protocol
The Language Server Protocol (LSP) has enabled hundreds of programming languages
to support rich code editing features such as code completions in dozens of text
editors including VS Code, Vim, Emacs and Sublime Text as well as the next
generation of web IDEs. While LSP has been successful at gaining industry
adoption, users coming from IDEs such as IntelliJ or XCode quickly observe that
a lot of nice features may be missing from the protocol.

In this talk, you will learn what steps Metals, a Scala language server, is
taking to bridge the gap between IDEs and LSP. We demonstrate how Metals uses a
suite of additional protocols including the Debug Adapter Protocol, Build Server
Protocol and Tree View Protocol to support run/test/debug, package explorers,
build explorers and more. Join us as we build a bright future for
cross-language, cross-platform and cross-editor tooling!

Speakers
avatar for Ólafur Geirsson

Ólafur Geirsson

Twitter
Ólafur Páll works on Scala developer tooling at Twitter. He is the author of several open source projects including Scalafmt, Scalafix and Metals.


Thursday November 14, 2019 5:00pm - 5:30pm
functional

5:00pm

Dagster: a Framework for Data Processing Applications

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions.  Computations themselves can be in the tools used by builders -- Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution.

The result is more reliable, testable, understandable data systems, that leverage the existing tools that work and that are deployable to your infrastructure.



Speakers
avatar for Nick Schrock

Nick Schrock

Founder, Elementl


Thursday November 14, 2019 5:00pm - 5:30pm
reactive

5:40pm

Panel: Who Needs Serverless?
Moderators
avatar for Vitaly Gordon

Vitaly Gordon

Co-founder & CEO, Faros.ai

Speakers
avatar for Steve Newman

Steve Newman

Founder and Chairman, Scalyr
I'm the founder of Scalyr, where we're building the next observability platform. My whole career has been startups; previously I co-founded Writely, which evolved into Google Docs. Love to talk about observability, scaling, and architecture.
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Jessie Frazelle

Jessie Frazelle

Software Engineer
Jessie Frazelle is a computer programmer who has worked at GitHub, Microsoft, Google, Docker and various companies, startups, even design agencies before that. She’s worked on a lot of the open source projects in the container ecosystem, she’s a top abuser of the GitHub api, and... Read More →
avatar for Jaana Dogan

Jaana Dogan

Engineer, Google
Jaana works on Google Compute Engine and is a familiar figure in the the software development community via her previous work on Go and OpenCensus, and from her blog and Twitter presence (@rakyll).
avatar for Rose Toomey

Rose Toomey

Software Engineer, Coatue Management
Big data, Spark, Scala, fintech, ETL pipelines, digital assets, and object allocation. I love a performance mystery.
avatar for George Mathew

George Mathew

Engineer, Oracle
George works at Oracle Cloud Infrastructure (OCI) where he is the tech lead for Oracle Functions - a container-native FaaS product. Prior to OCI, he worked at a healthcare startup called CareEvolution, and on the VisualStudio team at Microsoft.


Thursday November 14, 2019 5:40pm - 6:30pm
functional

6:30pm

Happy Hour
Great food, drinks, and company -- our legendary hallway track with everybody in it closing the day!

Thursday November 14, 2019 6:30pm - 8:00pm
commons
 
Friday, November 15
 

8:00am

9:00am

Kubernetes is a Platform Platform
The world of containers is moving to the next phase. We now have an evolving toolbox. The question is now what do we do with this new tool box?
We need to think beyond just running containers and instead about how we use these patterns and primitives to automate all parts of application development and operations. In this talk, Joe will cover the origins and history or Kubernetes, a refresher on it's inner workings and will outline how Kubernetes was built to be built upon. He will detail some of the innovative techniques and projects that are taking things to the next level.




Speakers
avatar for Joe Beda

Joe Beda

Principal Engineer, VMware
Doing cloud native stuff at VMware


Friday November 15, 2019 9:00am - 9:30am
functional

9:40am

Human-Centric ML Infrastructure at Netflix
In this talk, we will share our experiences on building Metaflow, a Python library that empowers data scientists at Netflix to prototype, build, deploy, and operate end-to-end machine learning solutions. We started building Metaflow at Netflix to provide a solid foundation for hundreds of internal ML use cases, from classical statistical analysis to large-scale applications of deep learning. Metaflow is designed with a human-centric mindset: instead of reinventing the wheel for large-scale computing or machine learning, we integrate existing solutions into a delightfully consistent and easy-to-use package. This talk focuses on our philosophy towards Machine Learning infrastructure and dives into the internals of Metaflow; it will highlight lessons that we have learned in building a Python library that needs to be robust, performant, and flexible enough to solve a large set of complex real-world business problems related to machine learning. This talk is for you if you want to learn how to develop systems for big data and ML in Python.

Speakers
avatar for Savin Goyal

Savin Goyal

Senior Software Engineer, Netflix
avatar for Ville Tuulos

Ville Tuulos

Architect, Netflix


Friday November 15, 2019 9:40am - 10:10am
data

9:40am

Unison, and why the codebase of the future is a purely functional data structure
Unison is an open source functional programming language with special support for building distributed, elastic systems. It began as an experiment: rethink all aspects of the programming experience, including the core language, runtime, tooling, as well as code versioning and publishing, and then do whatever is necessary to eliminate needless complexity and make building software once again delightful, or at the very least, reasonable. This talk zooms in on one aspect of Unison: it models the codebase not as a mutable bag of text files, but as a purely functional data structure. We'll explain what that means and show the benefits of the approach, which include: * Perfect incremental compilation and testing, with the compilation and test result caches shared among all collaborators * Refactoring of any size as a totally controlled experience where the codebase always typechecks and the code is always runnable * Instant, 100% accurate renames that never break downstream libraries or users * The ability to assign multiple names to the same definition, with all namings being fully compatible with one another * Simplified and more flexible dependency management; many causes of dependency hell simply cannot arise * The ability to serialize arbitrary Unison code, simply, without dependency management issues * And lots more... Besides introducing the big ideas and theory, we'll also show how the ideas get used in practice by demoing the Unison codebase editing tool live during the talk. It should be a lot of fun!

Speakers
avatar for Paul Chiusano

Paul Chiusano

Cofounder, Unison Computing


Friday November 15, 2019 9:40am - 10:10am
functional

9:40am

The Renaissance for Big Data and Parallelism with GraalVM
The Renaissance suite is a new benchmark suite focused on parallelism and concurrency, and provides workloads that exercise modern parallel programming abstractions and primitives provided by the JVM. Through these workloads, the suite aims to aid in understanding how modern applications and data processing frameworks use the features of the JVM, and to foster development of new optimizations that enable more efficient executions. The GraalVM team has used those benchmarks to improve and assess the performance of its compiler to make it one of the most efficient in the industry.
In this talk, we will discuss about this new suite and how it is helping compiler, GC, VM and tool implementers to fully support and optimize for the kind of workloads developers really care about. We will then dive into the GraalVM use case by detailing what makes GraalVM such a unique ecosystem.

Speakers
avatar for Christian Wimmer

Christian Wimmer

Consulting Researcher, Oracle
avatar for François Farquet

François Farquet

Senior Researcher, Oracle Labs


Friday November 15, 2019 9:40am - 10:10am
reactive

10:20am

machine learning and mobile
We will discuss different approaches for bringing machine learning to mobile devices, then build an end-to-end pipeline using swift for tensorflow and mlir to train and deploy models to a phone.

Speakers
avatar for brett koonce

brett koonce

cto, quarkworks
brettkoonce.com


Friday November 15, 2019 10:20am - 10:50am
data

10:20am

What is Functional Reactive Programming?
How can we work with time in functional programming? Traditionally, reactive systems—UIs, web servers, robotic controllers, simulations—are seen as inherently imperative, not suitable for functional programming. This does not have to be the case! Functional Reactive Programming (FRP), lets us have our cake and eat it too: we can use the composable, declarative style we love as functional programmers to write this kind of code. But what *is* FRP? It's surprisingly hard to get a clear answer without diving deeply into research literature. I will give you an introduction with practical Haskell examples that will get you over the hump to understanding and using FRP.

Speakers
avatar for Tikhon Jelvis

Tikhon Jelvis

Principal AI Scientist, Target
I picked up Haskell as my first functional language on a whim, and it's stuck with me ever since. I've worked with other functional languages too—a compiler in Racket, a backend service in OCaml—but now I'm back in the Haskell world, working on Target's supply chain optimization... Read More →


Friday November 15, 2019 10:20am - 10:50am
functional

10:20am

Change Data Capture in Distributed Systems
Modern systems are usually designed as a collection of cooperating micro-services. These services commonly have their dedicated data stores for their individual needs. To support various requirements corresponding data are often stored in data stores with very different characteristics and use cases. A fundamental requirement emerging from these architectures is the need to reliably capture primary data changes. Change Data Capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. In this talk, I’d like to discuss the advantages and disadvantages of various CDC approaches, provide you guidance in this area and also share our experience including various samples, and recommendations.

Speakers
avatar for Petr Zapletal

Petr Zapletal

Tech Lead, Disney Streaming Services
My name is Petr and I work for Disney Streaming Services (ex. Bamtech Media ex. Cake Solutions). I'm interested in Reactive and Distributed Systems, Streaming and ofc Scala and JVM.


Friday November 15, 2019 10:20am - 10:50am
reactive

11:00am

Ludwig, a Code-Free Deep Learning Toolbox
The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures.

Speakers
avatar for Piero Molino

Piero Molino

Senior ML / NLP Research Scientist, Uber AI Labs


Friday November 15, 2019 11:00am - 11:30am
data

11:00am

Rsc: Scala Outlining for Distributed Compilation
Compilation speed is a large pain point for many Scala developers. Twitter is one of the world’s largest Scala shops, and we continuously integrate all our projects at once in our monorepo. Lowering build times is crucial to help Twitter continue developing fast and safely. While Scala compilation is difficult to parallelize, the Language Tools team at Twitter has been working on a Scala outliner, Rsc, which produces the equivalent of C++ header files for Scala. Armed with these outlines, Scala compilation parallelism can be unlocked, allowing developers to take advantage of parallel, and even distributed, compilation to iterate ever faster. Learn how we've rolled out a Scala outliner into our continuous integration pipeline, while using open source APIs and implementations to compile and test millions of lines of code thousands of times a day to support low latency builds of Twitter's projects from source.

Speakers
avatar for Win Wang

Win Wang

Software Engineer, Twitter


Friday November 15, 2019 11:00am - 11:30am
functional

11:00am

Delta Lake: Open Source Reliability and Quality for Data Lakes
Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Speakers
avatar for Michael Paul Armbrust

Michael Paul Armbrust

Tech Lead for Delta Lake, Databricks
Michael Armbrust is a committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and the Delta Lake open source project. He received his PhD from UC Berkeley in 2013, and was... Read More →


Friday November 15, 2019 11:00am - 11:30am
reactive

11:40am

Weld: An Optimizing Runtime for High Performance Data Analytics
Developers write software by combining independently written libraries and functions. Even though individual functions in these libraries are optimized, the lack of end-to-end optimization can cause order of magnitude slowdowns in the whole workflow compared to a tuned implementation written in C. For example, even though TensorFlow uses highly tuned linear algebra functions for each of its operators, workflows that combine these operators can be 16x slower than hand-tuned code. Similarly, workflows that perform relational processing in Spark SQL or Pandas, numerical processing in NumPy, or a combination of these tasks spend much of their time in data movement across processing functions and could run up to 100× faster if optimized end to end. Weld is an ongoing open source project from Stanford to accelerate data-intensive applications by as much as 100×. It does so by JIT-compiling parallel code and optimizing across functions within a single library as well as across different libraries, so developers can write modular code and still get close to bare metal performance without incurring expensive data movement costs. Weld's compiler uses a new, explicitly parallel functional intermediate representation to capture the structure of data-parallel workloads such as SQL, machine learning, and graph analytics and then optimizes across them using adaptive optimizer that takes into account hardware characteristics. We demonstrate how Weld can be incrementally integrated into these libraries by porting only the most impactful operators first without breaking compatibility with other operators in the library, and without changing the API of the libraries (so users do not need to change their application code). We also show how Weld speeds up existing workloads in these frameworks and enables speed-ups of two orders of magnitude in applications that combine them. The Weld library and Weld-enabled versions of the Pandas and NumPy libraries are available to download on PyPi. Weld is open source at https://www.weld.rs.

Speakers
avatar for Shoumik Palkar

Shoumik Palkar

Ph.D. Student, Stanford University


Friday November 15, 2019 11:40am - 12:10pm
data

11:40am

A Gentle Introduction to Comonads
Aimed at programmers with some Scala experience that are interested in pure functional programming using the Cats library. The talk begins with an introduction to type classes and how they are implemented in Scala. Next we will look at Show, Functor, Monad and finally the Comonad type class, and finally some practical examples of how you can use Comonads in your own programs.

Speakers
avatar for Justin Heyes-Jones

Justin Heyes-Jones

Software Developer, YoppWorks
Justin is a Scala and pure fp fanatic


Friday November 15, 2019 11:40am - 12:10pm
functional

11:40am

8 Keys for Successful Serverless Architectures
Serverless promises to make it easier to build and deploy applications but it presents a new set of challenges. These challenges often come at a high cost and make it difficult to use Serverless as a more efficient platform than traditional microservices. These include such things as dealing with cold starts, data, testing and avoiding vendor lock-in. In this talk we will look at the most common challenges and what are the keys to a successful Serverless architecture.

Speakers
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Friday November 15, 2019 11:40am - 12:10pm
reactive

12:10pm

Lunch
Friday November 15, 2019 12:10pm - 1:00pm
commons

1:00pm

Discovering Your Model's Known Unknowns and Unknown Unknowns
Selecting the right training data for human review is known as Active Learning. Almost every company invents (or reinvents) the same Active Learning strategies and too often they repeat the same avoidable errors. This talk will share some common Active Learning strategies, with PyTorch examples, covering: Least Confidence Sampling, Entropy-based Sampling, Cluster-based Sampling, Model-based Outliers, Monte Carlo Dropouts (Deep Bayesian Active Learning), Representative Sampling, and Sampling for Real-World Diversity.

Speakers
avatar for Rob Munro

Rob Munro

Humanitarian and Technology experience includes: working in post-conflict development in Liberia and Sierra Leone for UNHCR; researching health communications in Malawi; software development supporting endangered languages; running crowdsourced translation following disasters in Haiti... Read More →


Friday November 15, 2019 1:00pm - 1:30pm
data

1:00pm

Rust and Scala, Sitting in a Tree….
As a Scala developer of many years, I started getting into Rust out of frustration with Scala and the JVM, working on in-memory databases and high performance data manipulation code.  Rust appealed due to its promise of safety, performance, AND high level abstractions.   Does it really deliver, and how does it compare with Scala in those respects?  In particular:
  •  Safety: what does it mean to be a safe language by default?  Let’s compare the two languages approach to safety
  • What are some similar and dissimilar functional features? 
  • Performance: A close look at how Rust delivers fast performance without sacrificing FP, or: Rust vs Scala functional transforms
  • Why Rust holds huge promise in data engineering
  • Is it possible to take advantage of some Rust while keeping your Scala codebase intact?

Speakers
avatar for Evan Chan

Evan Chan

Senior Software Engineer
Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed time-series database, as well as the Spark Job Server. He has led the design and implementation... Read More →


Friday November 15, 2019 1:00pm - 1:30pm
functional

1:00pm

Moonshot Spark: serverless with GraalVM
Can Apache Spark slip its earthly bounds and go serverless, clusterless? Popular cloud services are becoming more capable. AWS Lamba now runs three times longer, Fargate has become less expensive. GraalVM can reduce resource usage while improving cold start times. Consider how to handle small bursts of work. Would a standalone container suit best? If cold startup times weren't such an issue! What about a "mission control" model, where a long-running Spark driver dispatches work to ephemeral executors? What you gain in flexibility and convenience means concessions in performance. Chinning up with GraalVM native image helps. Shuffle is still problematic. Which experimental shuffle manager is best suited to the outer reaches of the cloud? There's not a practical use case for larger workflows - yet. But let's use this moonshot as a lens to magnify cloud performance issues. Explore how these solutions could apply to services you already use.

Speakers
avatar for Rose Toomey

Rose Toomey

Software Engineer, Coatue Management
Big data, Spark, Scala, fintech, ETL pipelines, digital assets, and object allocation. I love a performance mystery.


Friday November 15, 2019 1:00pm - 1:30pm
reactive

1:40pm

Introduction to geospatial analysis for uninitiated SQL Data Engineer
The talk will introduce GIS to uninitiated SQL data engineers. It would be most useful to someone who writes SQL queries and pipelines, know nothing about GIS, and wants to enhance the analysis with geospatial data.

Speakers
avatar for Michael Entin

Michael Entin

Software Engineer, Google Inc
Senior Software Engineer at Google BigQuery team.Before joining Dremel team, worked on various data processing projects at Microsoft: SQL Server Integration Services, Analysis Services, distributed platform for AdCenter Business Intelligence, etc.


Friday November 15, 2019 1:40pm - 2:10pm
data

1:40pm

Runtime Types at Crunchbase
Speakers
avatar for Themba Fletcher

Themba Fletcher

Crunchbase
Themba Fletcher is an enthusiastic technologist, a bit of a polyglot, and an obsessive troubleshooter who loves making things. He currently manages the Core Platform and Data Insights teams at Crunchbase. Fletcher focuses on scaling platforms, APIs, and engineering teams.


Friday November 15, 2019 1:40pm - 2:10pm
functional

1:40pm

Reliable Machine Learning
Machine learning has been described as "Software 2.0" and holds the promise of totally changing how software systems are constructed. But if manually-written "Software 1.0" code is still plagued with bugs, downtime, and security vulnerabilities, how can we hope to achieve reliable behavior in systems with significant data-dependent machine learning components and their attendant complexities? This talk will survey academic research, industry best practices, and software tools spanning the end-to-end development of machine learning systems from data pipelines to tests and types all the way through to end user experience.

Speakers
avatar for David Andrzejewski

David Andrzejewski

Engineering, Sumo Logic
David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →


Friday November 15, 2019 1:40pm - 2:10pm
reactive

2:20pm

Large Scale On-Demand Low-Latency Near Real-Time Predictions
Predictive machine learning is optimizing customer experiences across many industries. This session presents the development process at Sony PlayStation that delivers scalable real-time low-latency predictive ML-based solutions on the cloud.

Speakers
avatar for Gabor Melli

Gabor Melli

Senior Director of Engineering (ML&AI), Sony Interactive Entertainment


Friday November 15, 2019 2:20pm - 2:50pm
data

2:20pm

Taming complex webapps with Scala and React
Sufficiently complex requirements require sufficiently sophisticated patterns and practices to tame the overall complexity! Only then do we have any hope of delivering a useful and high-quality product that meets those requirements. Scala.js combined with React present a coherent combination with emphasizing both functional programming and immutability. In this talk, we will examine how we used this combination to deliver a complex set of requirements in a user-friendly application. We will explore several patterns we utilized: custom hooks to ensure reusable code and consistent user-experience, isomorphic implementation of algorithms to run the same code on servers and clients, and use of memoization to ensure a responsive UI.

Speakers
avatar for Kavita Laddad

Kavita Laddad

Co-founder, Paya Labs, Inc.
React, Scala.js, Indian classical music


Friday November 15, 2019 2:20pm - 2:50pm
functional

2:20pm

Fast and scalable domain-specific knowledge graphs generation
Very recently, there has been a lot of interest on construction of knowledge graphs. Large companies like Microsoft and Google operate large KBs and there are some open source examples like Yago. However, there are some scenarios where domain specific KBs are needed and Wikipedia data sources may not work. In this talk, I’ll describe techniques to build such type of KBs.

Speakers
avatar for Omar Alonso

Omar Alonso

Principal Applied Researcher, Microsoft
Omar is a Principal Data Scientist Lead at Microsoft in Silicon Valley where he works on the intersection of social media, temporal information, knowledge graphs, and human computation for the Bing search engine. He holds a PhD from the University of California at Davis. @elunca


Friday November 15, 2019 2:20pm - 2:50pm
reactive

3:00pm

Lessons Learnt Building Domain Specific NLP Pipelines
At Indix (acquired by Avalara), our goal was to build the "Google of Products". The product catalog currently has 3+ billion products which was amassed by crawling 5000+ retailer and brand web sites. Naturally, we needed a robust NLP pipeline to make sense of the unstructured text data at this scale. The first part of the talk will cover the evolution of the architecture, building blocks and algorithms of the NLP Pipeline. The building blocks I will cover are Language Models, Word Embeddings and Knowledge Graph. The algorithms I will cover will be classification, entity extraction, document similarity and query understanding (for e-commerce domain). Post acquisition by Avalara, the team was tasked to make sense of the unstructured text data in the Tax Compliance domain with limited data. The second part of the talk will focus on how we fine tuned the e-commerce NLP Pipeline and transferred our learnings from the e-commerce domain to the Tax Compliance domain.

Speakers
avatar for Rajesh Muppalla

Rajesh Muppalla

Senior Director of Engineering, Avalara
Sr. Director of Engineering at Avalara. Using AI to solve Tax automation. Previously co-Founder, Indix (acquired by Avalara), Tech Lead on go.cd. Topics - Machine Learning, Data Pipelines, Continuous Delivery, Mentoring


Friday November 15, 2019 3:00pm - 3:30pm
data

3:00pm

In Types We Trust
Scala ensures that types are used consistently with their declaration, but checks only the name and structure of the types. A type also implies a semantic contract, which is typically expressed in human-language documentation and checked by tests. Can we do better? In this talk I will propose that we formalize the specification of semantic contracts as statements of predicate logic. I will show how these statements of logic can be used in both property-based unit tests and proofs. I will show you new features of ScalaTest that support this approach.

Speakers
avatar for Bill Venners

Bill Venners

President, Artima, Inc.
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →


Friday November 15, 2019 3:00pm - 3:30pm
functional

3:00pm

Integrating React with Scala: delivering incremental value in a legacy Play web application
Do you hate JavaScript? Is your UI written in Scala but you can't hire any frontend developers to work on it or backend developers that want to? 
Come hear about how my team and I faced these exact problems: a frontend that's difficult to hire for, a monolithic web application with extremely slow build times, a smattering of untested and disorganized UI code.
Through trial and error we developed a way to inject React code into Scala Play, which maintains typesafety and takes advantage of JavaScript libraries to make frontend building faster, hire more, and incrementally work towards a split frontend and backend web application.
For UI business logic, we will discuss how you can use tools like TypeScript, React, and Redux to incrementally improve your web application. Then we will also discuss how tools like webpack and reverse proxying can be used to build the application for production and enable a smooth development experience for your engineers.

Speakers
avatar for Niole Nelson

Niole Nelson

Software Engineer, Domino Data Lab



Friday November 15, 2019 3:00pm - 3:30pm
reactive

3:40pm

Reliable, High Scale Tensorflow Inference Pipelines at Twitter

Twitter heavily relies on Scala/JVM and has deep expertise in this area. For instance, we’ve built Finagle for low latency client / server RPCs, Heron for near real time data processing and Scalding for offline use cases (Hadoop / Spark). In comparison, the ML world is focused on the Python / C++ stack.

To provide a reliable Tensorflow inference offering for the different use cases at Twitter, we’ve had to overcome multiple problems to make our offering reliable, cost effective and scalable to large models. In this presentation, we’ll present our key learnings.

We’ll do a deep dive into specific performance issues that we’ve had to deal with and show you how we’ve handled them and built the tools and techniques to mitigate both issues we observe as well quality gates to prevent issues in the future.. We’ll also have a particular emphasis on observability, catching performance issues early through automatic performance regression analysis on key metrics (CPU usage, memory usage, latency, throughput). We’ll also talk about caring what you should optimize for (throughput VS latency for instance) and thinking early about your performance goals and Service Level Objectives before working on a new model.

All of these aspects helped us serve successfully 50+ different models in production, serving 20M to 40M+ requests per second.

At the end of this talk, we hope that you will understand better the choices Twitter made along the way to create a reliable JVM based inference Pipeline and that you will be able to benefit from our experience.



Speakers
avatar for Briac Marcatté

Briac Marcatté

Staff ML Engineer, Twitter
avatar for Shajan Dasan

Shajan Dasan

Staff ML Engineer, Twitter
Staff Machine Learning Engineer at Twitter.Working on Distributed Systems for the last 15 years.


Friday November 15, 2019 3:40pm - 4:10pm
data

3:40pm

Running Amok to Ignite a Documentation Revolution!
Why is our ecosystem littered with so much incomplete, out-of-date and inadequate documentation? Why can't we check a library's v1.3.5 docs and read about how it contains a bug that's fixed in v1.3.6? Why can't a humble user contribute an improvement to the docs without involving the project maintainer, and having a new release made? Why does the documentation contain examples which don't compile? Why must we wait longer for a release of a library when only its docs are missing? Documentation needs to evolve faster, and to continue improving even after the software it describes has stopped. We need a documentation revolution! This talk will take a philosophical analysis of the causes of our industry's bad documentation culture and how our incumbent tooling and practises aren't helping. I will introduce Amok, a revolutionary new documentation management tool built upon Fury, for creating, maintaining, evolving, linking, versioning and checking documentation. Amok will take advantage of static build information which is now available thanks to Fury, and provide solutions to all the awkward questions above.

Speakers
avatar for Jon Pretty

Jon Pretty

Developer, Propensive
Jon Pretty is an international man of Scala mystery.


Friday November 15, 2019 3:40pm - 4:10pm
functional

3:40pm

Serverless Event-Driven Data Pipeline Platform
Airflow has become the defacto data pipeline platform in many companies.
Airflow was designed to run static, slow-moving workflows on a fixed schedule, and it is a great tool for that purpose. However, users often get into trouble by forcing their use cases to fit into Airflow’s model.
A few examples that Airflow can not satisfy in a first-class way includes:
- Complex DAGs leak application code into Pipeline
- DAGs which need to be run off-schedule or with no schedule at all
- DAGs that run concurrently with the same start time - DAGs with complicated branching logic - DAGs with many fast tasks
- DAGs which rely on the exchange of data - Parametrized DAGs

In this talk we present a brand new Serverless Event-Driven Pipeline Platform written in Scala that addresses all the problems above

Speakers
avatar for Rahul Chitturi

Rahul Chitturi

Principal Software Engineer, Coatue
avatar for Neelabh Gupta

Neelabh Gupta

Software Engineer, Coatue Management
Full-stack web development, TypeScript, React, Scala, Python


Friday November 15, 2019 3:40pm - 4:10pm
reactive

4:20pm

Build Your Own ML Data Feedback Loop
Machine learning models should learn from their history. Data collection and labeling is often the rate-limiting step of AI research. At Curai, our AI tools are deployed in a real-world healthcare setting, giving us the opportunity to learn from their usage. This talk will focus on how to build a semi-automated data feedback loop for ML model retraining, highlighting the specific use case at Curai. A data feedback loop consists of several key components. First, model output is presented to the user (in our case, a doctor or health professional), who can choose to accept or reject a medical suggestion. This usage data is then sent to data sinks and forwarded to a data store, where post-processing and additional calculations can happen (for example, calculating the edit distance between two strings). Processed data can then be sent down (most simply, through a CSV) to a model for retraining or fine-tuning, and the resulting v2 model can then be tested for accuracy and re-deployed into the product. In short, the semi-automated data feedback loop allows for rapid iteration and continuous learning for AI/ML models. This talk will focus on specific technologies I and my teammates have used, including, but not limited to, integration with StackDriver, BigQuery, and LaunchDarkly. Attendees will learn how to build a semi-automated data feedback loop, practical code examples and anecdotes of my own failures and successes in this domain, and ethical implications of using user-generated data for model retraining. There is tremendous potential for AI in healthcare, and closing the data loop for model retraining can help solve one of the key challenges in this domain and continuously improve machine learning models.

Speakers
avatar for Sophia Sanchez

Sophia Sanchez

Machine Learning Engineer, Curai


Friday November 15, 2019 4:20pm - 4:50pm
data

4:20pm

GDPR Data Cleaner: Mutating Immutable Data
Remember when data engineers and data scientists used to say things like: * “Log everything” * “Never throwaway data” * “All data is important” * “What is useless data today is tomorrow’s data of gold” And then that four letter acronym came into our vernacular…. *G-D-P-R* Now, you hear statements like this… * “Do we really need this data?” * “Is this data used at all?” * “What does the GDPR say about this type of data?” Another change that came with the GDPR is the right for a user to request the deletion of their personal data. This is a tricky proposition for those dealing with big data, since all big data technologies were based on the concept of immutable data. Big data systems, such as Hadoop and Spark, scaled so well because there were no updates of data, instead only appends, and the data was written out in large blocks, not conducive to small updates/deletes. In this talk, we discuss how personal data can be cleansed from existing big data storage systems, such as columnar-oriented Hive tables and key-value stores, and we will introduce a new open source project that implements these ideas.

Speakers
avatar for David Winters

David Winters

Big Data Architect, GoPro
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big... Read More →


Friday November 15, 2019 4:20pm - 4:50pm
functional

4:20pm

Maximizing Throughput and Scalability for Akka Streams
The Akka Streams API offers a robust, reliable, and expressive means for executing streaming workloads. For applications that demand high throughput, low latency, or increasing scalability, it is critical to understand how to maximize the throughput for Akka Streams. I will examine the structure of Akka Streams and explore techniques for maximizing the throughput of individual streams. I will describe how Akka Streams can be partitioned in order to provide scalability, as well as high-availability. Finally, I will review techniques for profiling and instrumenting Akka Streams to find the bottlenecks. Using the practical techniques from this talk, you will be able to improve the throughput, reliability, and scalability of your streaming applications.

Speakers
avatar for Colin Breck

Colin Breck

Sr. Staff Software Engineer, Tesla
Colin Breck has experience developing software infrastructures for the near real-time monitoring and control of industrial applications. At Tesla, he works on distributed systems for the monitoring, aggregation, optimization, and control of distributed-energy assets, including solar... Read More →


Friday November 15, 2019 4:20pm - 4:50pm
reactive

5:00pm

Next-generation frameworks for Large-scale Machine Learning
As the deep-learning revolution matures, there is ever-growing demand for bigger datasets, larger models and more compute infrastructure. What is the role of algorithmic design in this?  I will show several ways to infuse structure into deep networks to overcome these limitations, viz., through tensors, graphs, physical laws, and simulations. Tensorized neural networks lead to large rates of compression while improving on generalization and robustness. In order to speed up multi-node model training, I will demonstrate how simple gradient compression (SignSGD) leads to communication savings while preserving accuracy. Thus, with better algorithmic design, it is possible to obtain “free lunches” and obtain better efficiency in ML.

Speakers
avatar for Anima Anandkumar

Anima Anandkumar

Professor, Caltech
Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is t... Read More →


Friday November 15, 2019 5:00pm - 5:30pm
data

5:00pm

Netty 5: Lessons Learned
Netty is one of the most used network frameworks on the JVM (if not the most used) which provides its users not only with great flexibility but also with superb performance. While Netty 4.x was a great success and is used literally everywhere in production it became clear over the “years” that a few design choices that where made did produce various limitations. As work has started on Netty 5, it’s time to fix these limitations and incorporate all the feedback we received from the community and core maintainers. This talk will focus on multiple core changes that are scheduled for netty 5 by explaining what “real-world issues” these solve and how these changes will help to operate Netty in high-scale production environments. Also it will give a brief overview of the general planed timeline of Netty 5 and roadmap.

Speakers
avatar for Norman Maurer

Norman Maurer

Software Engineer


Friday November 15, 2019 5:00pm - 5:30pm
functional

5:00pm

Linkerd and the Service Mesh
In this talk, Charles Pretzer will present an overview of Linkerd, a "service mesh" for Kubernetes, and describe Linkerd's evolution over the years from a a 1.x branch in Scala, Finagle, Netty, and the JVM, to its modern 2.x incarnation in Go and Rust. Charles will cover the service mesh model and how Linkerd implements it, as well as lessons learned over almost four years of production experience at companies around the world.

Speakers
avatar for Charles Pretzer

Charles Pretzer

Field Engineer, Linkerd
Charles Pretzer is a field engineer at Buoyant, where he spends his time collaborating and engaging with the open source community of the CNCF service mesh, linkerd. He also enables production level adoption by helping companies integrate linkerd into their Kubernetes based applications... Read More →


Friday November 15, 2019 5:00pm - 5:30pm
reactive

5:40pm

Panel: AI Product
Moderators
avatar for Pete Skomoroch

Pete Skomoroch

Head of Data Products, Workday
Peter is Co-Founder and CEO of SkipFlag, which was acquired by Workday in 2018. Skipflag's technology uses your existing conversations, support tickets, and other communication to automatically build and update an enterprise knowledge base. It understands the people, topics, and facts... Read More →

Speakers
avatar for Manasi Vartak

Manasi Vartak

Founder & CEO, Verta
Manasi Vartak is the founder and CEO of Verta.ai (www.verta.ai), an MIT-spinoff building software to enable high-velocity machine learning. The Verta platform enables data scientists and ML engineers to robustly version ML models, collaborate and share ML knowledge, and to deploy... Read More →
avatar for Rob Munro

Rob Munro

Humanitarian and Technology experience includes: working in post-conflict development in Liberia and Sierra Leone for UNHCR; researching health communications in Malawi; software development supporting endangered languages; running crowdsourced translation following disasters in Haiti... Read More →
avatar for Anima Anandkumar

Anima Anandkumar

Professor, Caltech
Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is t... Read More →
avatar for Peter Bailis

Peter Bailis

Founder, Sisu
Founder and CEO, Sisu Data (https://sisu.ai) and Assistant Professor, Stanford CS (https://dawn.cs.stanford.edu)
avatar for Thanh Nguyen

Thanh Nguyen

Principal Data Scientist, Alibaba Cloud - Security Innovation Labs
Dr. Thanh Nguyen has 12+ years of experience in academia and industry. Before joining Alibaba Cloud, she was a Data Scientist at Nominum, the company that invented DNS software and got acquired by Akamai in 2017. Thanh joined Alibaba Cloud’s Security Innovation Lab in 2018. Together... Read More →


Friday November 15, 2019 5:40pm - 6:30pm
functional