Loading…
Attending this event?

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, November 13
 

9:00am

Portable End-to-End Serverless Workshop
Speakers: James Ward, Ryan Knight, and more leading experts from Google, Lightbend, Capital One, and more!

In this workshop you will setup & deploy a serverless application that includes event functions, services, data streaming, and machine learning.  For event functions and services we will use the open source Knative project on Kubernetes.  For data streaming we will use Apache Kafka.  And for machine learning we will use Kubeflow.  All of these pieces will be weaved together into a cohesive application.  You will be able to run everything on your own machine or on the cloud and we will use ZIO with Scala or you can choose Java or Kotlin with Micronaut.
​About the bespoke Scale By the Bay workshops: every year we produce one special workshop with industry leaders in an important area of software engineering.  In 2015, it was the original SMACK, complete end-to0end data pipelines workshop.  In 2017, it was the Istio worlshop with James Ward, Ryan Knight, Max Klein, and the Google Istio team.  In 2018, Cliff Click, the creator of the JVM HotSpot, taught us about JVM performance and lifting compiler performance technique onto a big data cluster.  And this year, James and Ryan return to explain serverless thoroughly and hands on.  James is now at GCP where cloud workloads of the future are being built.  Ryan is implementing Lightbend stack at Capital One and the workshop will be taught from the vast experience.  In one day, you'll come home with a full serverless backend!


Speakers
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Wednesday November 13, 2019 9:00am - 5:00pm
 
Thursday, November 14
 

8:00am

9:00am

Keynote
Speakers
avatar for Heather Miller

Heather Miller

Assistant Professor, Carnegie Mellon University


Thursday November 14, 2019 9:00am - 9:30am
functional

9:40am

Functional Electromagnetism
Strengthen your understanding of functional programming by looking at it from a fresh and unconventional perspective.

In this talk, we use GNU Radio to examine digital signal processing systems, and explore how we can use our understanding of functional programming to reason about unfamiliar systems such as software-defined radio by looking through the lens of category theory.

This talk was inspired by the paper "Categories for the Working Hardware Designer" by Mary Sheeran. Where Sheeran used category theory to derive theorems about a hardware description language, we will use it to reason about DSP systems.

Speakers
avatar for James Earl Douglas

James Earl Douglas

Software Engineering Consultant
Functional programmer, mountain biker, husband, and dad. Occasionally posts computerey things at https://earldouglas.com


Thursday November 14, 2019 9:40am - 10:10am
functional

9:40am

Massively Parallel Distributed Scala Compilation... And You!
To give Twitter engineers the confidence to rapidly ship changes to production, we continuously integrate Twitter's projects, top-to-bottom. To that end, we've been working toward significantly increasing the amount of parallelism in our compiles of Scala code (our largest supported language by far), and building support for distributed compilation to take advantage of that parallelism. Learn how we're using open source APIs and implementations to compile and test millions of lines of code thousands of times a day to support low latency builds of Twitter's projects from source.

Speakers
avatar for Danny McClanahan

Danny McClanahan

pants hacker + oss evangelist, Twitter
avatar for Stu Hood

Stu Hood

Senior Staff Software Engineer, Twitter, Inc


Thursday November 14, 2019 9:40am - 10:10am
reactive

10:20am

How to Eliminate Surprises In Your Data
How do you know you can trust the accuracy of the data flowing through a pipeline, and the insights derived from it? At Spotify, we have an infrastructure team focused on data quality to address this problem. From the cultural changes we’re making to give data engineers a quality mindset, to the specific tools we’ve written, we’ll explain how we increase confidence and eliminate surprises in our data contents, and how we approach problems in the wide space of ‘data quality.’ You’ll learn about a few key moments in the pipeline lifecycle when data quality might be compromised, and the approach we took to improving them.

Speakers
avatar for Anne DeCusatis

Anne DeCusatis

Data Infrastructure Engineer, Spotify
talk to me about data quality!My pronouns are they/them/theirs.
avatar for Idrees Khan

Idrees Khan

Senior Data Engineer, Spotify


Thursday November 14, 2019 10:20am - 10:50am
data

10:20am

Thank you, next: Iterators
Iterators are a powerful abstraction in programming languages, that abstract away complex structures and operations. The pattern is used throughout big data and the Scala collection library. In this session, we will dive deep into iterators, and ways to use them in your codebase.

Speakers
avatar for Umayah Abdennabi

Umayah Abdennabi

Software Engineer, Grammarly
Umayah is a software engineer on the data team at Grammarly where he works on an internal data analytics platform.


Thursday November 14, 2019 10:20am - 10:50am
functional

10:20am

Run Like a Boss in Cloud: How Istio and Kubernetes are Changing the Microservices Completely
With cloud-native architectures, we face challenges of distributed systems in terms of integration, failures, discovery, and monitoring. Istio and Kubernetes together meet these challenges, by providing an additional layer between services and the network, enabling you to control orchestration outside code. This revolutionizes the way services are connected, managed, and secured in cloud-native architectures. Through a series of quick demos and java code snippets, this session showcases how you can start utilizing Istio on Kubernetes for your own Java-based microservice architecture. in addition to this, we will try to showcase other CNCF fleet tools ( Kiali , Jaeger, ServiceGraph etc) for implementing the better microservice architecture.

Speakers
avatar for Muktesh Mishra

Muktesh Mishra

Principal Engineer, Capital One LLC
Muktesh is currently working as a Senior Software Engineer for Capital One’s Developer Platform. He is an open source contributor to 20+ projects and enjoys programming in polyglot. Primarily he is more interested and contributes in Microservices, Cloud, Containerization, Architectures... Read More →


Thursday November 14, 2019 10:20am - 10:50am
reactive

11:00am

From datasets to tables in a multitenant data lake
Salesforce Einstein democratize access to world class machine learning in the Salesforce ecosystem by making it easier to build trusted, scalable, and efficient ML powered apps. A major effort required is to make tenant data available to those ML processes. This talks will cover our journey to change the major abstraction offered by the data micro-services in the Einstein platform, moving from the dataset to the table. In particular, why we think that new abstraction is more useful for consumer of the service, and the technology choices we have made.

Speakers
avatar for Thomas Gerber

Thomas Gerber

Director of Engineering, Salesforce


Thursday November 14, 2019 11:00am - 11:30am
data

11:00am

Hacking F# in JS ecosystem
Javascript has conquered the world - developers can use it in the browser, on the server, to write mobile apps, on the desktop with Electron, and even to create serverless services. Like the language or not, the truth is JS developers have built an incredible ecosystem with libraries and tools to do almost anything. During the talk I'll show how to bring the power of F# - the functional paradigm, static typing with type inference, pattern matching, and more modern language features - to this huge and rich JS world using Fable - F# to JS compiler. Fable doesn't add any runtime overhead and generates clean JS code in conformance with new ES6 patterns, like modules or iterables, making it compatible with modern development tools, including Github Electron or React Native to let you develop not only web but also cross-platform desktop and mobile apps. I'll demonstrate how to create different types of JS applications using F# - from React-based frontend application, through the mobile app using React Native to serverless services with amazing webtask.io

Speakers
avatar for Krzysztof Cieslak

Krzysztof Cieslak

CEO & Open Source Developer, Lambda Factory
Chris is software developer, consultant, founder of [Lambda Factory](http://lambdafactory.io). He's author of [Ionide](http://ionide.io/), [Forge](http://forge.run), [Fornax](https://gitlab.com/Krzysztof-Cieslak/Fornax), project owner and maintainer of [VSCode-Elm](https://marketplace.visualstudio.com/items?itemName=sbrink.elm... Read More →


Thursday November 14, 2019 11:00am - 11:30am
functional

11:00am

Enabling real time querying of Data using Apache Druid, Flink and Kafka
In this talk, we'll learn more about how Apache Druid powers alerting against real time data at Lyft, which is useful for several use cases including validating A/B tests, accuracy of emails sent out to customers and for internal tools. We'll talk about the challenges we faced while setting up our real time ingestion pipeline into Druid using Apache Flink and Kafka, and how we went about solving them.

Speakers
avatar for Sharanya Santhanam

Sharanya Santhanam

Software Engineer, Lyft
Im a Software Engineer @ Lyft working in the Data Platform Infrastructure team. I work on Interactive Query Engines. Interested to chat about Druid & Presto.
avatar for Shiv Toolsidass

Shiv Toolsidass

Software Engineering - Data Infrastructure, Lyft


Thursday November 14, 2019 11:00am - 11:30am
reactive

11:40am

Vectorized Query Processing for CPUs and GPUs using Apache Arrow
Query processing technology has seen rapid development since the iconic C-Store paper was published in 2005. The focus has been on designing query processing algorithms and data structures that efficiently utilize CPU and leverage the changing trends in hardware to deliver optimal performance. In this talk we will explore different types of vectorized query processing in Dremio using Apache Arrow. Abstract Columnar data has become the de facto format for building high performance query engines that run analytical workloads. Apache Arrow is an in-memory columnar data format that houses canonical in-memory representations for both flat and nested data structures. It is a natural complement to on-disk formats like Apache Parquet and Apache ORC. Data stored in a columnar format is amenable to processing using vectorized instructions (SIMD) available on all modern architectures. Query processing algorithms can implement simple and efficient code that operates on the columnar values in a tight-loop, providing fast and CPU cache-friendly access patterns. Operations like SUM, FILTER, COUNT, MIN, MAX etc on columnar data can be made more efficient by leveraging the data-level parallelism property of SIMD instructions. Columnar data can be encoded using lightweight algorithms like dictionary encoding, run length encoding, bit packing and delta encoding that are far more CPU efficient than general purpose compression algorithms like LZO and ZLIB. Furthermore, vectorized query processing algorithms can be written in a manner that are aware of column level encoding and can easily operate on the compressed column values in some cases. This saves CPU-memory bandwidth since we need only decompress the necessary column values. Columnar format allows us to efficiently utilize CPU and GPU cache by filling cache lines with related data (column values from an in-memory vector). With the increasing use of GPUs and FPGAs, efficient use of the smaller on-chip memory available in these architectures is especially important. In addition, Apache Arrow allows for zero-copy, shared access to buffers so that multiple processes can more efficiently operate on the same data. On the storage side, columnar representation of on-disk data makes a good case for efficient utilization of disk I/O bandwidth for analytical queries. Dremio’s query processing engine leverages columnar format of Apache Arrow and Parquet for in-memory and on-disk representations respectively. We have vectorized implementations of operators like hash join and hash aggregation to name a few.

Speakers
avatar for Jacques Nadeau

Jacques Nadeau

CTO & Co-founder, Dremio


Thursday November 14, 2019 11:40am - 12:10pm
data

11:40am

Recursion schemes with Higherkindness
Recursive structures appear in many problems, from databases to machine learning, and writing functions to operate over them is not always a simple task. As functional programming tries to abstract as many things as possible, it offers a way to decouple a recursion from the implementation of business rules. In this session, Andy and Oli will guide you through recursion schemes fundamentals and Droste, a recursion library for Scala. Along the way, we will explore how it can be utilized in practice, including examples of its usage in Skeuomorph, a library for transforming data protocols.

Speakers
avatar for Oli Makhasoeva

Oli Makhasoeva

Solutions Architect, 47 Degrees
I'm hosting lovely podcasts about Scala
avatar for Andy Scott

Andy Scott

Person, Stripe


Thursday November 14, 2019 11:40am - 12:10pm
functional

11:40am

Integrating Developer Experiences - Build Server Protocol and beyond
IDEs - Integrated Development Environments - traditionally provide out of the box support for many of the tasks that go into making working software out of source code. But increasingly, developers expect to be able to use any one of a variety of special-purpose tools for each task. This shifts the focus of the IDE from "Integrated" to "Integrating" external tools into a coherent experience. Especially in the Scala ecosystem, we have an increasing number of build tools to choose from. I have been focusing on integrating sbt and other new tools with the IntelliJ Scala plugin and will talk about challenges involved and how the Build Server Protocol makes it possible for IntelliJ to interface with any build tool.

Speakers
avatar for Justin Kaeser

Justin Kaeser

Software Developer, JetBrains


Thursday November 14, 2019 11:40am - 12:10pm
reactive

12:10pm

Lunch
Thursday November 14, 2019 12:10pm - 1:00pm
commons

1:00pm

Solving the Scala Notebook Experience
Notebooks have become an essential tool for data science and machine learning research. We felt the existing tools weren't satisfactory for Scala, and lacked the features that developers need in order to productively create reproducible notebooks. So we set out to create a new notebook tool from scratch, which provides essential code editing features that other tools lack, as well as seamless interoperability between multiple languages – including Scala, Python, and SQL – and a host of other improvements that evolve the notebook experience. We'll demonstrate our open-source notebook solution, and talk about why and how we built it – and some of the great Scala ecosystem libraries that allowed us to go from zero to MVP in an unbelievably short time.

Speakers
avatar for Jeremy Smith

Jeremy Smith

Sr. Software Engineer, Netflix
JI

Jonathan Indig

Sr. Software Engineer, Netflix


Thursday November 14, 2019 1:00pm - 1:30pm
functional

1:00pm

High Performance Serverless Functions in Scala
I'll show you how to easily build serverless functions in Scala, including AWS Lambda, that beat back "cold start" issues with extreme low response times

Speakers
avatar for Jason A Swartz

Jason A Swartz

Edge EM, Twitch


Thursday November 14, 2019 1:00pm - 1:30pm
reactive

1:40pm

Having fun with AutoML
Some people say that building Machine Learning applications is a tangled and laborious process. I'll prove them wrong. In this talk I will walk you through examples of applying AutoML techniques with TransmogrifAI and Apache Spark. If applied correctly, these recipes would help you ship your ideas from notebook to production x100 times faster and have fun in the process!

Speakers
avatar for Matthew Tovbin

Matthew Tovbin

Software Architect, Salesforce
Matthew Tovbin is a Software Architect at Salesforce, engineering Salesforce Einstein AI Platform, which powers the world’s smartest CRM. He is a co-author of TransmogrifAI (https://transmogrif.ai), an open-source AutoML library for structured data on Apache Spark. Before joining... Read More →


Thursday November 14, 2019 1:40pm - 2:10pm
data

1:40pm

A brief introduction to systems programming, with Scala Native
With Scala Native's new unsafe API, Scala programmers have access to just as much power as C programmers have had for 50 years. But what does systems programming even look like in a modern language, with Scala's immensely expressive type system? We'll find out as we explore the fundamental concepts of systems programming: pointers, structs, arrays, and strings. As we proceed, we'll see how Scala can provide safer and more ergonomic patterns than C, and compare Scala Native's capabilities to languages like Rust and OCaml. And finally, we'll look at the ways hardware is changing, and the role systems programming (and Scala) can play in defining the patterns and architectures of the future.

Speakers
avatar for Richard Whaling

Richard Whaling

Lead Data Engineer, M1 Finance


Thursday November 14, 2019 1:40pm - 2:10pm
functional

2:20pm

End-to-End ML Pipelines with KubeFlow and TensorFlow Extended (TFX)
Title Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + Airflow + Jupyter Description In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. XGBoost results on the pipelines UI Airflow is the most-widely used pipeline orchestration framework in machine learning. Pre-requisites Modern browser - and that's it! Every attendee will receive a cloud instance Nothing will be installed on your local laptop Everything can be downloaded at the end of the workshop Agenda 1. Create a Kubernetes cluster 2. Install KubeFlow, Airflow, TFX, and Jupyter 3. Setup ML Training Pipelines with KubeFlow and Airflow 4. Transform Data with TFX Transform 5. Validate Training Data with TFX Data Validation 6. Train Models with Jupyter, Keras, and TensorFlow 2.0 7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow Fairing 8. Analyze Models using TFX Model Analysis and Jupyter 9. Perform Hyper-Parameter Tuning with KubeFlow and Katib 10. Select the Best Model using KubeFlow Experiment Tracking 11. Reproduce Model Training with TFX Metadata Store 12. Deploy the Model to Production with TensorFlow Serving and Istio 13. Save and Download your Workspace Key Takeaways Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools. Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extend ed (TFX) + Kubernetes + Airflow + Jupyter In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow. Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google. KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking. Airflow is the most-widely used pipeline orchestration framework in machine learning. Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.

Speakers
avatar for Chris Fregly

Chris Fregly

Founder, PipelineAI
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup... Read More →


Thursday November 14, 2019 2:20pm - 2:50pm
data

2:20pm

Quill + Doobie = Better Together
The power of today’s Open Source libraries is integration, and the Cats ecosystem is a great case-in-point. Combining Http4s, fs2, and Doobie makes for a powerful recipe that is ridiculously easy to use. As of Doobie 0.7.0, Quill has joined the fray, bridging the gap to the database layer. You can now model your data layer, compose it into queries, transact it, and send it out to the world, without ever having to leave the functional paradigm. By creating this integration, both Doobie and Quill get more than the sum of their parts. Doobie for instance, leverages fs2 to get rich set of effect-level JDBC operations that support everything from parallel-streaming queries, to reactive monitoring of asynchronous events from database change-listeners. Typically, the SQL queries for these varying use-cases will be similar but also different in non-trivial ways. This is where Quill comes to the rescue, allowing the common parts of SQL-queries to be abstracted away from the uncommon parts, ensuring a very DRY and maintainable solution, especially when the queries get very, very big. Join us on the next step of this wonderful journey!

Speakers
avatar for Alexander Ioffe

Alexander Ioffe

Senior Scala Evangelist, Nasdaq
avatar for Rob Norris

Rob Norris

Programmer, Gemini Observatory
Software Engineer


Thursday November 14, 2019 2:20pm - 2:50pm
functional

2:20pm

Everything old is new: today's infrastructure as yesterday's Internet
In the age of cloud computing, what was once boring infrastructure has become incredibly exciting. From containers to service discovery to cluster schedulers, both industry and academia have been innovating at an alarming pace. With so many systems rapidly evolving in a domain fraught with trade-offs, it has become difficult to see the proverbial forest for the trees. In this talk we will try to see how we can evaluate this new landscape of systems by exploring classic networking papers and seeing what the design principles of yesterday's Internet have to say about the design decisions of today's infrastructure.

Speakers
avatar for Adelbert Chang

Adelbert Chang

Lead AI Engineer, Target
Adelbert Chang is an engineer at Target where he works on deployment infrastructure for the AI Engineering team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems, rule engines, and developer... Read More →


Thursday November 14, 2019 2:20pm - 2:50pm
reactive

3:00pm

Apache Flink 2.0: Unified Enterprise Data Processing System and Beyond
Apache Flink powers some of the world's largest stream processing use cases in companies like Netflix, Alibaba, and Uber.

Despite its success in streaming field, Flink is born with the vision of unifying streaming and batch for unbounded and bounded dataset, and the community has rebuilt Flink's tech stack with battle-tested batch capabilities to bring a truly unification experience to enterprise users.

Besides, enterprises are building up their machine learning platforms on data processing systems. Apache Flink has introduced brand new machine pipeline and libraries to facilitate ML developers and users, and help them make the most value out of their data. In this talk, I will discuss the data processing challenges that enterprises are facing nowadays, and what are the key benefits to unify their streaming, batch, and machine learning systems.

I will discuss in depth some new, critical Flink functions in latest releases and how they enable users to solve the above problems elegantly from an overall perspective. I will highlight Flink's new capabilities such as full SQL DDL/DML, Flink-Hive Integration, Machine Learning pipeline and libraries, Python Table API, etc.

Speakers
avatar for Bowen Li

Bowen Li

Senior Software Engineer, Alibaba
Bowen is a committer of Apache Flink and Senior Software Engineer at Alibaba. He has been helping to advance Flink as the next-generation, unified data processing system. Bowen frequently give talks of Flink at conferences, and organizes Flink meetups and events in Seattle.


Thursday November 14, 2019 3:00pm - 3:30pm
data

3:00pm

Speedy Scala Builds at Databricks
Building Scala code in general can be really slow. To speed this up, Databricks' Developer Tools team has taken on a variety of projects to attack the problem from different angles - from JVM tuning to cloud infrastructure - resulting in build times that are significantly less infuriating. This talk will walk you through the details of each project, and attach concrete numbers to exactly how much of a difference each one made in this year-long effort.

Speakers
avatar for Li Haoyi

Li Haoyi

Software Engineer, Databricks
avatar for Ahir Reddy

Ahir Reddy

Software Engineer, Databricks


Thursday November 14, 2019 3:00pm - 3:30pm
functional

3:00pm

To Spark or Not to Spark
Heard about the exciting new world of distributed Analytics with Spark but not sure if it's appropriate for your use case? In this talk, we'll walk through the basic use cases for Spark with distributed databases like Apache Cassandra. We'll outline the potential uses for any organization, even those not requiring generic analytics capabilities. Learn about how we can use Spark to load data, modify tables, and move data from cluster to cluster. Discover more advanced use cases, like working with streaming services and messaging queues. Find out about all the exciting things you can do with Spark and when you may be able to get away without it!

Speakers
avatar for Russell Spitzer

Russell Spitzer

Software Engineer, DataStax
Spark, Cassandra, or Dogs.


Thursday November 14, 2019 3:00pm - 3:30pm
reactive

3:40pm

Swift for TensorFlow: Machine Learning with No Boundaries
Swift for TensorFlow is a platform for the next generation of machine learning that leverages innovations like first-class differentiable programming to seamlessly integrate deep neural networks with traditional software development. In this session, learn how Swift for TensorFlow can make advanced machine learning research easier and why Jeremy Howard’s fast.ai has chosen it for the latest iteration of their deep learning course.

Speakers
avatar for Paige Bailey

Paige Bailey

Developer Advocate (TensorFlow), Google


Thursday November 14, 2019 3:40pm - 4:10pm
data

3:40pm

Scoring ONNX ML Models with Scala
ONNX is an emerging standard format for serializing machine learning models. This talk will introduce Agate, Stripe's library for scoring ONNX models on the JVM in pure Scala. Stripe uses Agate to score deep learning models in batch in spark and scalding, and also in real-time using our scala-based scoring service. We'll talk about performance, how Agate was developed, and how we use graal native-image built binaries to interop with python.

Speakers
avatar for Oscar Boykin

Oscar Boykin

Machine Learning Infrastructure, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.


Thursday November 14, 2019 3:40pm - 4:10pm
functional

3:40pm

Serverless Scala - Functions as SuperDuperMicroServices
Serverless is all the rage but what does it mean for Scala developers? Can we take a plain ol' Scala function and run it on the cloud with infinite scalability? This talk will explore how to build and deploy serverless Scala and how to avoid startup overhead. We will also explore how to build pure serverless functions to make programs more provably correct, easier to build, test, and run. We will use Google Cloud as a reference serverless implementation but the concepts are applicable with any provider.

Speakers
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Josh Suereth

Josh Suereth

Engineer, Google


Thursday November 14, 2019 3:40pm - 4:10pm
reactive

4:20pm

Hack Weekend: ML models on mobile
ML models are increasingly deployed on phones, but what does it actually take to go from a state of the art model in Python to running that model on a phone? Spoiler alert: a lot. Erik recaps his weekend of attempting to go from 0 mobile development or on device model experience to running GPT-2 in an iOS app, covering the problems he encountered using Core ML/Onnx/TFLite, how to solve those problems, and why Swift for TensorFlow has the potential to change everything.

Speakers
avatar for Erik Reppel

Erik Reppel

ML Platform Engineer, Coinbase
Erik Reppel is an engineer on the Machine Learning and Platform team at Coinbase where he primarily works on improving the quality of ML tooling and deploying ML models at scale.


Thursday November 14, 2019 4:20pm - 4:50pm
data

4:20pm

Re-programming the programmer, from Actors to FP
Over the last few years I have built a DNS management system. Initially started as an Event Sourcing application built in Akka, the system had to be re-architected multiple times to address unforeseen issues stemming from new requirements, operational issues, and developer pitfalls (mistakes). This talk will introduce concepts in the DNS domain and different architecture styles including Event Sourcing in Akka and Stream processing in FS2. The talk will describe the journey from inception through to the current system design, highlighting the key challenges encountered along the way and the evolution of the design to account for those challenges. I plan on using real code to demonstrate each architecture along the journey.

Speakers
avatar for Paul Cleary

Paul Cleary

Senior Principal Engineer, Comcast


Thursday November 14, 2019 4:20pm - 4:50pm
functional

4:20pm

Deploy end to end ML pipeline using Apache spark streaming and kubernetes.
Deploy an end to end ML pipeline using Apache spark streaming and kubernetes. Distributed streaming processing engines, like Apache Spark(TM) Structured streaming can help in various ways for performing machine learning in real time at a large scale. A typical streaming machine learning end to end pipeline consists of : # Preprocessing the data based on the application. e.g. normalising or cleaning etc.. # Using micro service and kubernetes hosting the model, using IBM MAX (IBM Model asset exchange). # Scaling the entire pipeline using Apache Spark and kubernetes. This talk may consist of a live demo of applying the above technique, for predicting objects in an image, using an object detection model. Since this is a streaming application, the prediction will be made in realtime. Key takeaways: # Learn about reusing ML models using IBM Model asset exchange. # Learn about how to scale an online ML application end to end, using Apache Spark Structured streaming and kubernetes. Details of associated code and data source used for the demo available here: https://github.com/ScrapCodes/SS-on-kube

Speakers
avatar for Prashant Sharma

Prashant Sharma

System Software Engineer, IBM
avatar for Nick Pentreath

Nick Pentreath

Principal Engineer - Center for Open Source Data & AI Technologies (CODAIT), IBM
Nick Pentreath is a principal engineer in IBM's Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match... Read More →


Thursday November 14, 2019 4:20pm - 4:50pm
reactive

5:00pm

Deep Learning Done Right (with Scala)
I'll introduce Nexus, a fully-typesafe deep learning framework in Scala, that offers unforseen typesafety (axes of tensors are typed statically) and succinctness to deep learning developers by extensive use of typelevel computation through the popular library Shapeless. In this talk I'll introduce the design of a deep learning framework, and how Scala's type-level computation abilities could make it safer, easier to write and more expressive.

Speakers
avatar for Tongfei Chen

Tongfei Chen

PhD candidate, Johns Hopkins University
Natural language processing researcher; programming language aficionado. Likes to talk about NLP/ML/AI/type systems/functional programming.


Thursday November 14, 2019 5:00pm - 5:30pm
data

5:00pm

TBD
Thursday November 14, 2019 5:00pm - 5:30pm
functional

5:00pm

Dagster: a Framework for Data Processing Applications

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions.  Computations themselves can be in the tools used by builders -- Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution.

The result is more reliable, testable, understandable data systems, that leverage the existing tools that work and that are deployable to your infrastructure.



Speakers
avatar for Nick Schrock

Nick Schrock

Founder, Elementl


Thursday November 14, 2019 5:00pm - 5:30pm
reactive

5:40pm

Panel: Who Needs Serverless?
Moderators
avatar for Vitaly Gordon

Vitaly Gordon

VP, Data Science and Engineering, Salesforce Einstein
VP, Data Science and Data Engineering, Salesforce Einstein

Speakers
avatar for James Ward

James Ward

Developer Advocate, Google
James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →
avatar for Rose Toomey

Rose Toomey

Software Engineer, Coatue Management
Big data, Spark, Scala, fintech, ETL pipelines, digital assets, and object allocation. I love a performance mystery.


Thursday November 14, 2019 5:40pm - 6:30pm
functional

6:30pm

Happy Hour
Great food, drinks, and company -- our legendary hallway track with everybody in it closing the day!

Thursday November 14, 2019 6:30pm - 8:00pm
commons
 
Friday, November 15
 

8:00am

9:00am

Kubernetes is a Platform Platform
The world of containers is moving to the next phase. We now have an evolving toolbox. The question is now what do we do with this new tool box?
We need to think beyond just running containers and instead about how we use these patterns and primitives to automate all parts of application development and operations. In this talk, Joe will cover the origins and history or Kubernetes, a refresher on it's inner workings and will outline how Kubernetes was built to be built upon. He will detail some of the innovative techniques and projects that are taking things to the next level.




Speakers
avatar for Joe Beda

Joe Beda

Principal Engineer, VMware
Doing cloud native stuff at VMware


Friday November 15, 2019 9:00am - 9:30am
functional

9:40am

Human-Centric ML Infrastructure at Netflix
In this talk, we will share our experiences on building Metaflow, a Python library that empowers data scientists at Netflix to prototype, build, deploy, and operate end-to-end machine learning solutions. We started building Metaflow at Netflix to provide a solid foundation for hundreds of internal ML use cases, from classical statistical analysis to large-scale applications of deep learning. Metaflow is designed with a human-centric mindset: instead of reinventing the wheel for large-scale computing or machine learning, we integrate existing solutions into a delightfully consistent and easy-to-use package. This talk focuses on our philosophy towards Machine Learning infrastructure and dives into the internals of Metaflow; it will highlight lessons that we have learned in building a Python library that needs to be robust, performant, and flexible enough to solve a large set of complex real-world business problems related to machine learning. This talk is for you if you want to learn how to develop systems for big data and ML in Python.

Speakers
avatar for Savin Goyal

Savin Goyal

Senior Software Engineer, Netflix
avatar for Ville Tuulos

Ville Tuulos

Architect, Netflix


Friday November 15, 2019 9:40am - 10:10am
data

9:40am

Unison, and why the codebase of the future is a purely functional data structure
Unison is an open source functional programming language with special support for building distributed, elastic systems. It began as an experiment: rethink all aspects of the programming experience, including the core language, runtime, tooling, as well as code versioning and publishing, and then do whatever is necessary to eliminate needless complexity and make building software once again delightful, or at the very least, reasonable. This talk zooms in on one aspect of Unison: it models the codebase not as a mutable bag of text files, but as a purely functional data structure. We'll explain what that means and show the benefits of the approach, which include: * Perfect incremental compilation and testing, with the compilation and test result caches shared among all collaborators * Refactoring of any size as a totally controlled experience where the codebase always typechecks and the code is always runnable * Instant, 100% accurate renames that never break downstream libraries or users * The ability to assign multiple names to the same definition, with all namings being fully compatible with one another * Simplified and more flexible dependency management; many causes of dependency hell simply cannot arise * The ability to serialize arbitrary Unison code, simply, without dependency management issues * And lots more... Besides introducing the big ideas and theory, we'll also show how the ideas get used in practice by demoing the Unison codebase editing tool live during the talk. It should be a lot of fun!

Speakers
avatar for Paul Chiusano

Paul Chiusano

Cofounder, Unison Computing
avatar for Rúnar Bjarnason

Rúnar Bjarnason

Cofounder
Please add a profile photo and bio ASAP -- going live Monday!Cheers,A+


Friday November 15, 2019 9:40am - 10:10am
functional

9:40am

The Renaissance for Big Data and Parallelism with GraalVM
The Renaissance suite is a new benchmark suite focused on parallelism and concurrency, and provides workloads that exercise modern parallel programming abstractions and primitives provided by the JVM. Through these workloads, the suite aims to aid in understanding how modern applications and data processing frameworks use the features of the JVM, and to foster development of new optimizations that enable more efficient executions. The GraalVM team has used those benchmarks to improve and assess the performance of its compiler to make it one of the most efficient in the industry.
In this talk, we will discuss about this new suite and how it is helping compiler, GC, VM and tool implementers to fully support and optimize for the kind of workloads developers really care about. We will then dive into the GraalVM use case by detailing what makes GraalVM such a unique ecosystem.

Speakers
avatar for Christian Wimmer

Christian Wimmer

Consulting Researcher, Oracle
avatar for François Farquet

François Farquet

Senior Researcher, Oracle Labs


Friday November 15, 2019 9:40am - 10:10am
reactive

10:20am

machine learning and mobile
We will discuss different approaches for bringing machine learning to mobile devices, then build an end-to-end pipeline for training and deploying models on a phone.

Speakers
avatar for brett koonce

brett koonce

cto, quarkworks
brettkoonce.com


Friday November 15, 2019 10:20am - 10:50am
data

10:20am

Change Data Capture in Distributed Systems
Modern systems are usually designed as a collection of cooperating micro-services. These services commonly have their dedicated data stores for their individual needs. To support various requirements corresponding data are often stored in data stores with very different characteristics and use cases. A fundamental requirement emerging from these architectures is the need to reliably capture primary data changes. Change Data Capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. In this talk, I’d like to discuss the advantages and disadvantages of various CDC approaches, provide you guidance in this area and also share our experience including various samples, and recommendations.

Speakers
avatar for Petr Zapletal

Petr Zapletal

Tech Lead, Disney Streaming Services
My name is Petr and I work for Disney Streaming Services (ex. Bamtech Media ex. Cake Solutions). I'm interested in Reactive and Distributed Systems, Streaming and ofc Scala and JVM.


Friday November 15, 2019 10:20am - 10:50am
reactive

11:00am

TBD
Friday November 15, 2019 11:00am - 11:30am
data

11:00am

Rsc: Scala Outlining for Distributed Compilation
Compilation speed is a large pain point for many Scala developers. Twitter is one of the world’s largest Scala shops, and we continuously integrate all our projects at once in our monorepo. Lowering build times is crucial to help Twitter continue developing fast and safely. While Scala compilation is difficult to parallelize, the Language Tools team at Twitter has been working on a Scala outliner, Rsc, which produces the equivalent of C++ header files for Scala. Armed with these outlines, Scala compilation parallelism can be unlocked, allowing developers to take advantage of parallel, and even distributed, compilation to iterate ever faster. Learn how we've rolled out a Scala outliner into our continuous integration pipeline, while using open source APIs and implementations to compile and test millions of lines of code thousands of times a day to support low latency builds of Twitter's projects from source.

Speakers
avatar for Win Wang

Win Wang

Software Engineer, Twitter


Friday November 15, 2019 11:00am - 11:30am
functional

11:00am

Delta Lake: Open Source Reliability and Quality for Data Lakes
Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Speakers
avatar for Michael Paul Armbrust

Michael Paul Armbrust

Tech Lead for Delta Lake, Databricks
Michael Armbrust is a committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and the Delta Lake open source project. He received his PhD from UC Berkeley in 2013, and was... Read More →


Friday November 15, 2019 11:00am - 11:30am
reactive

11:40am

Weld: An Optimizing Runtime for High Performance Data Analytics
Developers write software by combining independently written libraries and functions. Even though individual functions in these libraries are optimized, the lack of end-to-end optimization can cause order of magnitude slowdowns in the whole workflow compared to a tuned implementation written in C. For example, even though TensorFlow uses highly tuned linear algebra functions for each of its operators, workflows that combine these operators can be 16x slower than hand-tuned code. Similarly, workflows that perform relational processing in Spark SQL or Pandas, numerical processing in NumPy, or a combination of these tasks spend much of their time in data movement across processing functions and could run up to 100× faster if optimized end to end. Weld is an ongoing open source project from Stanford to accelerate data-intensive applications by as much as 100×. It does so by JIT-compiling parallel code and optimizing across functions within a single library as well as across different libraries, so developers can write modular code and still get close to bare metal performance without incurring expensive data movement costs. Weld's compiler uses a new, explicitly parallel functional intermediate representation to capture the structure of data-parallel workloads such as SQL, machine learning, and graph analytics and then optimizes across them using adaptive optimizer that takes into account hardware characteristics. We demonstrate how Weld can be incrementally integrated into these libraries by porting only the most impactful operators first without breaking compatibility with other operators in the library, and without changing the API of the libraries (so users do not need to change their application code). We also show how Weld speeds up existing workloads in these frameworks and enables speed-ups of two orders of magnitude in applications that combine them. The Weld library and Weld-enabled versions of the Pandas and NumPy libraries are available to download on PyPi. Weld is open source at https://www.weld.rs.

Speakers
avatar for Shoumik Palkar

Shoumik Palkar

Ph.D. Student, Stanford University


Friday November 15, 2019 11:40am - 12:10pm
data

11:40am

Next-Level Diagnostics for Async & Concurrent Errors with ZIO
A strength of the Scala programming language is its powerful support for asynchronous and concurrent programming—historically with Future and Akka, and today, with next-generation effect systems like ZIO.

While ZIO features like fiber-based concurrency, software transactional memory, and async/concurrent resource safety may grab headlines, in everyday programming, we spend a lot of our time debugging our async/concurrent code.

In this presentation by John A. De Goes and Salar Rahmanian, you’ll see how newly-developed features in ZIO make it easier than ever to troubleshoot problems in modern applications.

You’ll discover how execution traces show exactly the line-by-line flow of your async/concurrent code (including where it would continue to if it did not error!), and how interactive debugging features let you identify and troubleshoot stalled async code.

Discover just how powerful async and concurrent programming has become in Scala!

Speakers
avatar for Salar Rahmanian

Salar Rahmanian

Software Developer, Mya Systems
I have been developing software since the age of eleven and have over 20 years of commercial experience. My passion and expertise is focused on functional programming and building concurrent and distributed systems using Scala. I am a core developer for the ZIO Scala Library for asynchronous... Read More →
avatar for John A. De Goes

John A. De Goes

Solution Architect, De Goes Consulting
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →


Friday November 15, 2019 11:40am - 12:10pm
functional

11:40am

8 Keys for Successful Serverless Architectures
Serverless promises to make it easier to build and deploy applications but it presents a new set of challenges. These challenges often come at a high cost and make it difficult to use Serverless as a more efficient platform than traditional microservices. These include such things as dealing with cold starts, data, testing and avoiding vendor lock-in. In this talk we will look at the most common challenges and what are the keys to a successful Serverless architecture.

Speakers
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Friday November 15, 2019 11:40am - 12:10pm
reactive

12:10pm

Lunch
Friday November 15, 2019 12:10pm - 1:00pm
commons

1:00pm

Machine Learning's Missed Opportunity in Visual Data Management
ApertureData's platform accelerates AI applications through its Data Management solution that redefines how large visual data sets are stored, searched and processed. It exposes a unified interface that allows users to store and search both the data and metadata associated with visual artifacts (images or videos). ApertureData's platform provides several innovative features: the ability to evolve metadata easily without requiring costly schema change, first-class status for feature vectors and bounding boxes, the ability to perform similarity searches as well as the ability to perform common pre-processing operations close to the data. The platform will be pluggable in allowing data to be stored on different backends and serve any machine learning pipeline. Based on our current work with customers, our platform, when used for a medical imaging use case, provides up to 5X improvement over the range of queries executed commonly in the field and can save upwards of 2 months per data scientist per machine learning deployment for every new application that wants to exploit data to gather insights. What other makeshift solutions fail to address is that once AI is ready to be commercialized, managing the onslaught of real visual data is going to be a killer for real deployments. Our talk will explain how ApertureData Platform achieves the performance and functionality for a wide range of application domains as well as a demo to show how to use it.

Speakers
avatar for Vishakha Gupta-Cledat

Vishakha Gupta-Cledat

Founder and CEO, ApertureData
I am the Founder and CEO of ApertureData. Prior to that, I was at Intel Labs for over 7 years where I led the design and development of VDMS (the Visual Data Management System) which forms the core of the ApertureData Platform. I have a Ph.D in Computer Science from the Georgia Institute... Read More →


Friday November 15, 2019 1:00pm - 1:30pm
data

1:00pm

Rust and Scala, Sitting in a Tree….
As a Scala developer of many years, I started getting into Rust out of frustration with Scala and the JVM, working on in-memory databases and high performance data manipulation code.  Rust appealed due to its promise of safety, performance, AND high level abstractions.   Does it really deliver, and how does it compare with Scala in those respects?  In particular:
  •  Safety: what does it mean to be a safe language by default?  Let’s compare the two languages approach to safety
  • What are some similar and dissimilar functional features? 
  • Performance: A close look at how Rust delivers fast performance without sacrificing FP, or: Rust vs Scala functional transforms
  • Why Rust holds huge promise in data engineering
  • Is it possible to take advantage of some Rust while keeping your Scala codebase intact?

Speakers
avatar for Evan Chan

Evan Chan

Senior Software Engineer, Apple
Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed time-series database, as well as the Spark Job Server. He has led the design and implementation... Read More →


Friday November 15, 2019 1:00pm - 1:30pm
functional

1:00pm

Moonshot Spark: serverless with GraalVM
Can Apache Spark slip its earthly bounds and go serverless, clusterless? Popular cloud services are becoming more capable. AWS Lamba now runs three times longer, Fargate has become less expensive. GraalVM can reduce resource usage while improving cold start times. Consider how to handle small bursts of work. Would a standalone container suit best? If cold startup times weren't such an issue! What about a "mission control" model, where a long-running Spark driver dispatches work to ephemeral executors? What you gain in flexibility and convenience means concessions in performance. Chinning up with GraalVM native image helps. Shuffle is still problematic. Which experimental shuffle manager is best suited to the outer reaches of the cloud? There's not a practical use case for larger workflows - yet. But let's use this moonshot as a lens to magnify cloud performance issues. Explore how these solutions could apply to services you already use.

Speakers
avatar for Rose Toomey

Rose Toomey

Software Engineer, Coatue Management
Big data, Spark, Scala, fintech, ETL pipelines, digital assets, and object allocation. I love a performance mystery.


Friday November 15, 2019 1:00pm - 1:30pm
reactive

1:40pm

Introduction to geospatial analysis for uninitiated SQL Data Engineer
The talk will introduce GIS to uninitiated SQL data engineers. It would be most useful to someone who writes SQL queries and pipelines, know nothing about GIS, and wants to enhance the analysis with geospatial data.

Speakers
avatar for Michael Entin

Michael Entin

Senior Software Engineer, Google Inc
Senior Software Engineer at Google Dremel / BigQuery team. Before joining Dremel team, worked on various data processing projects at Microsoft: SQL Server Integration Services, Analysis Services, distributed platform for AdCenter Business Intelligence, etc.


Friday November 15, 2019 1:40pm - 2:10pm
data

1:40pm

TBD
Friday November 15, 2019 1:40pm - 2:10pm
functional

1:40pm

Reliable Machine Learning
Machine learning has been described as "Software 2.0" and holds the promise of totally changing how software systems are constructed. But if manually-written "Software 1.0" code is still plagued with bugs, downtime, and security vulnerabilities, how can we hope to achieve reliable behavior in systems with significant data-dependent machine learning components and their attendant complexities? This talk will survey academic research, industry best practices, and software tools spanning the end-to-end development of machine learning systems from data pipelines to tests and types all the way through to end user experience.

Speakers
avatar for David Andrzejewski

David Andrzejewski

Engineering, Sumo Logic
David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →


Friday November 15, 2019 1:40pm - 2:10pm
reactive

2:20pm

Large Scale On-Demand Low-Latency Near Real-Time Predictions
Predictive machine learning is powering innovation across industries, including healthcare, finance, media & entertainment, and many more. This session presents the successful development process at Sony Interactive Entertainment for the delivery scalable real-time low-latency preditive ML-based solutions on the cloud.

Speakers
avatar for Gabor Melli

Gabor Melli

Senior Director of Engineering (ML&AI), Sony Interactive Entertainment


Friday November 15, 2019 2:20pm - 2:50pm
data

2:20pm

Taming complex webapps with Scala and React
Sufficiently complex requirements require sufficiently sophisticated patterns and practices to tame the overall complexity! Only then do we have any hope of delivering a useful and high-quality product that meets those requirements. Scala.js combined with React present a coherent combination with emphasizing both functional programming and immutability. In this talk, we will examine how we used this combination to deliver a complex set of requirements in a user-friendly application. We will explore several patterns we utilized: custom hooks to ensure reusable code and consistent user-experience, isomorphic implementation of algorithms to run the same code on servers and clients, and use of memoization to ensure a responsive UI.

Speakers
avatar for Kavita Laddad

Kavita Laddad

Co-founder, Paya Labs, Inc.
React, Scala.js, Indian classical music


Friday November 15, 2019 2:20pm - 2:50pm
functional

2:20pm

Fast and scalable domain-specific knowledge graphs generation
Very recently, there has been a lot of interest on construction of knowledge graphs. Large companies like Microsoft and Google operate large KBs and there are some open source examples like Yago. However, there are some scenarios where domain specific KBs are needed and Wikipedia data sources may not work. In this talk, I’ll describe techniques to build such type of KBs.

Speakers
avatar for Omar Alonso

Omar Alonso

Principal Applied Researcher, Microsoft
Omar is a Principal Data Scientist Lead at Microsoft in Silicon Valley where he works on the intersection of social media, temporal information, knowledge graphs, and human computation for the Bing search engine. He holds a PhD from the University of California at Davis. @elunca


Friday November 15, 2019 2:20pm - 2:50pm
reactive

3:00pm

Lessons Learnt Building Domain Specific NLP Pipelines
At Indix (acquired by Avalara), our goal was to build the "Google of Products". The product catalog currently has 3+ billion products which was amassed by crawling 5000+ retailer and brand web sites. Naturally, we needed a robust NLP pipeline to make sense of the unstructured text data at this scale. The first part of the talk will cover the evolution of the architecture, building blocks and algorithms of the NLP Pipeline. The building blocks I will cover are Language Models, Word Embeddings and Knowledge Graph. The algorithms I will cover will be classification, entity extraction, document similarity and query understanding (for e-commerce domain). Post acquisition by Avalara, the team was tasked to make sense of the unstructured text data in the Tax Compliance domain with limited data. The second part of the talk will focus on how we fine tuned the e-commerce NLP Pipeline and transferred our learnings from the e-commerce domain to the Tax Compliance domain.

Speakers
avatar for Rajesh Muppalla

Rajesh Muppalla

Senior Director of Engineering, Avalara


Friday November 15, 2019 3:00pm - 3:30pm
data

3:00pm

In Types We Trust
Scala ensures that types are used consistently with their declaration, but checks only the name and structure of the types. A type also implies a semantic contract, which is typically expressed in human-language documentation and checked by tests. Can we do better? In this talk I will propose that we formalize the specification of semantic contracts as statements of predicate logic. I will show how these statements of logic can be used in both property-based unit tests and proofs. I will show you new features of ScalaTest that support this approach.

Speakers
avatar for Bill Venners

Bill Venners

President, Artima, Inc.
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →


Friday November 15, 2019 3:00pm - 3:30pm
functional

3:00pm

Integrating React with Scala: delivering incremental value in a legacy Play web application
Do you hate JavaScript? Is your UI written in Scala but you can't hire any frontend developers to work on it or backend developers that want to? 
Come hear about how my team and I faced these exact problems: a frontend that's difficult to hire for, a monolithic web application with extremely slow build times, a smattering of untested and disorganized UI code.
Through trial and error we developed a way to inject React code into Scala Play, which maintains typesafety and takes advantage of JavaScript libraries to make frontend building faster, hire more, and incrementally work towards a split frontend and backend web application.
For UI business logic, we will discuss how you can use tools like TypeScript, React, and Redux to incrementally improve your web application. Then we will also discuss how tools like webpack and reverse proxying can be used to build the application for production and enable a smooth development experience for your engineers.

Speakers
avatar for Niole Nelson

Niole Nelson

Software Engineer, Domino Data Lab


Friday November 15, 2019 3:00pm - 3:30pm
reactive

3:40pm

Reliable, High Scale Tensorflow Inference Pipelines at Twitter
Twitter is near real time - it lets users see what's happening now. In this talk we discuss how we keep machine learning models fresh and relevant when the world around us changes fast.

Speakers
avatar for Briac Marcatté

Briac Marcatté

Staff ML Engineer, Twitter
avatar for Shajan Dasan

Shajan Dasan

Staff ML Engineer, Twitter


Friday November 15, 2019 3:40pm - 4:10pm
data

3:40pm

Running Amok to Ignite a Documentation Revolution!
Why is our ecosystem littered with so much incomplete, out-of-date and inadequate documentation? Why can't we check a library's v1.3.5 docs and read about how it contains a bug that's fixed in v1.3.6? Why can't a humble user contribute an improvement to the docs without involving the project maintainer, and having a new release made? Why does the documentation contain examples which don't compile? Why must we wait longer for a release of a library when only its docs are missing? Documentation needs to evolve faster, and to continue improving even after the software it describes has stopped. We need a documentation revolution! This talk will take a philosophical analysis of the causes of our industry's bad documentation culture and how our incumbent tooling and practises aren't helping. I will introduce Amok, a revolutionary new documentation management tool built upon Fury, for creating, maintaining, evolving, linking, versioning and checking documentation. Amok will take advantage of static build information which is now available thanks to Fury, and provide solutions to all the awkward questions above.

Speakers
avatar for Jon Pretty

Jon Pretty

Developer, Propensive
Jon Pretty is an international man of Scala mystery.


Friday November 15, 2019 3:40pm - 4:10pm
functional

3:40pm

Serverless Event-Driven Data Pipeline Platform
Airflow has become the defacto data pipeline platform in many companies. Airflow was designed to run static, slow-moving workflows on a fixed schedule, and it is a great tool for that purpose. However, users often get into trouble by forcing their use cases to fit into Airflow’s model. A few examples that Airflow can not satisfy in a first-class way includes: - Complex DAGs leak application code into Pipeline - DAGs which need to be run off-schedule or with no schedule at all - DAGs that run concurrently with the same start time - DAGs with complicated branching logic - DAGs with many fast tasks - DAGs which rely on the exchange of data - Parametrized DAGs In this talk we present a brand new Serverless Event-Driven Pipeline Platform written in Scala that addresses all the problems above

Speakers
avatar for Rahul Chitturi

Rahul Chitturi

Principal Software Engineer, Coatue


Friday November 15, 2019 3:40pm - 4:10pm
reactive

4:20pm

Build Your Own ML Data Feedback Loop
Machine learning models should learn from their history. Data collection and labeling is often the rate-limiting step of AI research. At Curai, our AI tools are deployed in a real-world healthcare setting, giving us the opportunity to learn from their usage. This talk will focus on how to build a semi-automated data feedback loop for ML model retraining, highlighting the specific use case at Curai. A data feedback loop consists of several key components. First, model output is presented to the user (in our case, a doctor or health professional), who can choose to accept or reject a medical suggestion. This usage data is then sent to data sinks and forwarded to a data store, where post-processing and additional calculations can happen (for example, calculating the edit distance between two strings). Processed data can then be sent down (most simply, through a CSV) to a model for retraining or fine-tuning, and the resulting v2 model can then be tested for accuracy and re-deployed into the product. In short, the semi-automated data feedback loop allows for rapid iteration and continuous learning for AI/ML models. This talk will focus on specific technologies I and my teammates have used, including, but not limited to, Pandas, integration with StackDriver and BigQuery, and Airflow. Attendees will learn how to build a semi-automated data feedback loop, practical code examples and anecdotes of my own failures and successes in this domain, and ethical implications of using user-generated data for model retraining. There is tremendous potential for AI in healthcare, and closing the data loop for model retraining can help solve one of the key challenges in this domain and continuously improve machine learning models.

Speakers
avatar for Sophia Sanchez

Sophia Sanchez

Machine Learning Engineer, Curai


Friday November 15, 2019 4:20pm - 4:50pm
data

4:20pm

GDPR Data Cleaner: Mutating Immutable Data
Remember when data engineers and data scientists used to say things like: * “Log everything” * “Never throwaway data” * “All data is important” * “What is useless data today is tomorrow’s data of gold” And then that four letter acronym came into our vernacular…. *G-D-P-R* Now, you hear statements like this… * “Do we really need this data?” * “Is this data used at all?” * “What does the GDPR say about this type of data?” Another change that came with the GDPR is the right for a user to request the deletion of their personal data. This is a tricky proposition for those dealing with big data, since all big data technologies were based on the concept of immutable data. Big data systems, such as Hadoop and Spark, scaled so well because there were no updates of data, instead only appends, and the data was written out in large blocks, not conducive to small updates/deletes. In this talk, we discuss how personal data can be cleansed from existing big data storage systems, such as columnar-oriented Hive tables and key-value stores, and we will introduce a new open source project that implements these ideas.

Speakers
avatar for David Winters

David Winters

Big Data Architect, GoPro
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big... Read More →


Friday November 15, 2019 4:20pm - 4:50pm
functional

4:20pm

Maximizing Throughput and Scalability for Akka Streams
The Akka Streams API offers a robust, reliable, and expressive means for executing streaming workloads. For applications that demand high throughput, low latency, or increasing scalability, it is critical to understand how to maximize the throughput for Akka Streams. I will examine the structure of Akka Streams and explore techniques for maximizing the throughput of individual streams. I will describe how Akka Streams can be partitioned in order to provide scalability, as well as high-availability. Finally, I will review techniques for profiling and instrumenting Akka Streams to find the bottlenecks. Using the practical techniques from this talk, you will be able to improve the throughput, reliability, and scalability of your streaming applications.

Speakers
avatar for Colin Breck

Colin Breck

Sr. Staff Software Engineer, Tesla
Colin Breck has experience developing software infrastructures for the near real-time monitoring and control of industrial applications. At Tesla, he works on distributed systems for the monitoring, aggregation, optimization, and control of distributed-energy assets, including solar... Read More →


Friday November 15, 2019 4:20pm - 4:50pm
reactive

5:00pm

TBD
Speakers
avatar for Anima Anandkumar

Anima Anandkumar

Professor, Caltech
Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is t... Read More →


Friday November 15, 2019 5:00pm - 5:30pm
data

5:00pm

Growing the Scala Community
The Scala community has grown significantly over the past 15 years.  As a community, we wrote millions of lines of code and developed hundreds of projects. While the language is thriving, there is still room to contribute to the community. Different from other tech talks, this talk focuses on contributing to the diversity aspect of the community. It explains the significance and benefits of diversity, and it proposes solutions to diversify and improve the community. One of the best ways to grow the community and to bring diversity into the community is to organize ScalaBridge workshops, which are intended to provide resources for people from underrepresented populations to learn Scala. (Diversity comes in many forms: race, gender, age, religion, culture, sexual orientation, socioeconomic background, etc.) While the workshops have positive and lasting impacts, it cannot be done by one individual or by a single organization. In order for the Scala community to become more diverse, we need your help to scale up! Attend this talk to learn about how to contribute to our community!

Speakers
avatar for Yifan Xing

Yifan Xing

Software Developer
Yifan is a software engineer, ScalaBridge organizer, and open-source contributor. Her work involves many distributed systems related topics, including network protocols, consensus, network security, etc. Yifan contributed to the message queue systems and asynchronous APIs for a Scala... Read More →


Friday November 15, 2019 5:00pm - 5:30pm
functional

5:00pm

Netty 5: Lessons Learned
Netty is one of the most used network frameworks on the JVM (if not the most used) which provides its users not only with great flexibility but also with superb performance. While Netty 4.x was a great success and is used literally everywhere in production it became clear over the “years” that a few design choices that where made did produce various limitations. As work has started on Netty 5, it’s time to fix these limitations and incorporate all the feedback we received from the community and core maintainers. This talk will focus on multiple core changes that are scheduled for netty 5 by explaining what “real-world issues” these solve and how these changes will help to operate Netty in high-scale production environments. Also it will give a brief overview of the general planed timeline of Netty 5 and roadmap.

Speakers
avatar for Norman Maurer

Norman Maurer

Software Engineer


Friday November 15, 2019 5:00pm - 5:30pm
reactive

5:40pm

Panel: AI Product
Moderators
avatar for Pete Skomoroch

Pete Skomoroch

Head of Data Products, Workday
Peter is Co-Founder and CEO of SkipFlag, which was acquired by Workday in 2018. Skipflag's technology uses your existing conversations, support tickets, and other communication to automatically build and update an enterprise knowledge base. It understands the people, topics, and facts... Read More →

Speakers
avatar for Anima Anandkumar

Anima Anandkumar

Professor, Caltech
Anima Anandkumar holds dual positions in academia and industry. She is a Bren professor at Caltech CMS department and a director of machine learning research at NVIDIA. At NVIDIA, she is leading the research group that develops next-generation AI algorithms. At Caltech, she is t... Read More →
TN

Thanh Nguyen

Principal Data Scientist, Alibaba


Friday November 15, 2019 5:40pm - 6:30pm
functional