Scale By the Bay 2019 has ended

Create Your Own Event

Scale By the Bay 2019

9:40am PST

Finding Needles In Big Data Haystacks using Finite State Machine

Working with data often means trying to locate data that fits patterns, akin to "Finding a needle in a haystack". When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process. This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson's 1968 CACM paper, was to create a Non Deterministic Finite Automaton around patterns defined using regex. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike. In this talk, we will cover how we built this framework (dubbed "Conduit") and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.

Speakers

Rashmi Shamprasad

Senior Data Engineer, Netflix

Passionate about all things data, Rashmi Shamprasad is a Senior Data Engineer on the Growth Data Engineering team at Netflix, building data products that enable Non Member Acquisition & Experimentation. With over 9 years of experience working in Big Data, her previous stints include... Read More →

Ajit Koti

Senior Engineer, Netflix

Ajit Koti is a Senior Engineer on the Growth Data Engineering team at Netflix, building and architecting large-scale distributed systems and real-time data processing engines.Ajit has worked previously at Fanatics, IBM Labs, J P Morgan and has extensive experience in building distributed... Read More →

Thursday November 14, 2019 9:40am - 10:10am PST
reactive

10:20am PST

Run Like a Boss in Cloud: How Istio and Kubernetes are Changing the Microservices Completely

With cloud-native architectures, we face challenges of distributed systems in terms of integration, failures, discovery, and monitoring. Istio and Kubernetes together meet these challenges, by providing an additional layer between services and the network, enabling you to control orchestration outside code. This revolutionizes the way services are connected, managed, and secured in cloud-native architectures. Through a series of quick demos and java code snippets, this session showcases how you can start utilizing Istio on Kubernetes for your own Java-based microservice architecture. in addition to this, we will try to showcase other CNCF fleet tools ( Kiali , Jaeger, ServiceGraph etc) for implementing the better microservice architecture.

Speakers

Muktesh Mishra

Technical Lead, Adobe

Muktesh is currently working as a Sr. Staff Software Engineer for Adobe's Developer Productivity Group. He is an open-source contributor to 20+ projects and enjoys programming in polyglots. Primarily he is more interested and contributes to Microservices, Cloud Computing, Containerization... Read More →

Thursday November 14, 2019 10:20am - 10:50am PST
reactive

11:00am PST

Next-Level Diagnostics for Async & Concurrent Errors with ZIO

A strength of the Scala programming language is its powerful support for asynchronous and concurrent programming—historically with Future and Akka, and today, with next-generation effect systems like ZIO.

While ZIO features like fiber-based concurrency, software transactional memory, and async/concurrent resource safety may grab headlines, in everyday programming, we spend a lot of our time debugging our async/concurrent code.

In this presentation by John A. De Goes and Salar Rahmanian, you’ll see how newly-developed features in ZIO make it easier than ever to troubleshoot problems in modern applications.

You’ll discover how execution traces show exactly the line-by-line flow of your async/concurrent code (including where it would continue to if it did not error!), and how interactive debugging features let you identify and troubleshoot stalled async code.

Discover just how powerful async and concurrent programming has become in Scala!

Speakers

Salar Rahmanian

Software Engineer, Collective Health

I have been developing software since the age of eleven and have over 20 years of commercial experience. My passion and expertise is focused on functional programming and building concurrent and distributed systems using Scala. I am a core developer for the ZIO Scala Library for asynchronous... Read More →

John A. De Goes

Solution Architect, De Goes Consulting

John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →

Thursday November 14, 2019 11:00am - 11:30am PST
reactive

11:40am PST

Integrating Developer Experiences - Build Server Protocol and beyond

IDEs - Integrated Development Environments - traditionally provide out of the box support for many of the tasks that go into making working software out of source code. But increasingly, developers expect to be able to use any one of a variety of special-purpose tools for each task. This shifts the focus of the IDE from "Integrated" to "Integrating" external tools into a coherent experience. Especially in the Scala ecosystem, we have an increasing number of build tools to choose from. I have been focusing on integrating sbt and other new tools with the IntelliJ Scala plugin and will talk about challenges involved and how the Build Server Protocol makes it possible for IntelliJ to interface with any build tool.

Speakers

Justin Kaeser

Software Developer, JetBrains

Thursday November 14, 2019 11:40am - 12:10pm PST
reactive

1:00pm PST

High Performance Serverless Functions in Scala

I'll show you how to easily build serverless functions in Scala, including AWS Lambda, that beat back "cold start" issues with extreme low response times

Speakers

Jason Swartz

Edge EM, Twitch

Thursday November 14, 2019 1:00pm - 1:30pm PST
reactive

1:40pm PST

Growing the Scala Community

The Scala community has grown significantly over the past 15 years. As a community, we wrote millions of lines of code and developed hundreds of projects. While the language is thriving, there is still room to contribute to the community. Different from other tech talks, this talk focuses on contributing to the diversity aspect of the community. It explains the significance and benefits of diversity, and it proposes solutions to diversify and improve the community. One of the best ways to grow the community and to bring diversity into the community is to organize ScalaBridge workshops, which are intended to provide resources for people from underrepresented populations to learn Scala. (Diversity comes in many forms: race, gender, age, religion, culture, sexual orientation, socioeconomic background, etc.) While the workshops have positive and lasting impacts, it cannot be done by one individual or by a single organization. In order for the Scala community to become more diverse, we need your help to scale up! Attend this talk to learn about how to contribute to our community!

Speakers

Yifan Xing

Software Developer

Yifan is a software engineer, ScalaBridge organizer, and open-source contributor. Her work involves many distributed systems related topics, including network protocols, consensus, network security, etc. Yifan contributed to the message queue systems and asynchronous APIs for a Scala... Read More →

Thursday November 14, 2019 1:40pm - 2:10pm PST
reactive

2:20pm PST

Everything old is new: today's infrastructure as yesterday's Internet

In the age of cloud computing, what was once boring infrastructure has become incredibly exciting. From containers to service discovery to cluster schedulers, both industry and academia have been innovating at an alarming pace. With so many systems rapidly evolving in a domain fraught with trade-offs, it has become difficult to see the proverbial forest for the trees. In this talk we will try to see how we can evaluate this new landscape of systems by exploring classic networking papers and seeing what the design principles of yesterday's Internet have to say about the design decisions of today's infrastructure.

Speakers

Adelbert Chang

Lead AI Engineer, Target

Adelbert Chang is an engineer at Target where he works on deployment infrastructure for the AI Engineering team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems, rule engines, and developer... Read More →

Thursday November 14, 2019 2:20pm - 2:50pm PST
reactive

3:00pm PST

To Spark or Not to Spark

Heard about the exciting new world of distributed Analytics with Spark but not sure if it's appropriate for your use case? In this talk, we'll walk through the basic use cases for Spark with distributed databases like Apache Cassandra. We'll outline the potential uses for any organization, even those not requiring generic analytics capabilities. Learn about how we can use Spark to load data, modify tables, and move data from cluster to cluster. Discover more advanced use cases, like working with streaming services and messaging queues. Find out about all the exciting things you can do with Spark and when you may be able to get away without it!

Speakers

Russell Spitzer

Software Engineer, DataStax

Spark, Cassandra, or Dogs.

Thursday November 14, 2019 3:00pm - 3:30pm PST
reactive

3:40pm PST

Serverless Scala - Functions as SuperDuperMicroServices

Serverless is all the rage but what does it mean for Scala developers? Can we take a plain ol' Scala function and run it on the cloud with infinite scalability? This talk will explore how to build and deploy serverless Scala and how to avoid startup overhead. We will also explore how to build pure serverless functions to make programs more provably correct, easier to build, test, and run. We will use Google Cloud as a reference serverless implementation but the concepts are applicable with any provider.

Speakers

James Ward

Developer Advocate, Google Cloud

James Ward is a nerd / software developer who shares what he learns with others though presentations, blogs, demos, and code. After over two decades of professional programming, he is now a self-proclaimed Typed Pure Functional Programming zealot but often compromises on his ideals... Read More →

Josh Suereth

Engineer, Google

Thursday November 14, 2019 3:40pm - 4:10pm PST
reactive

4:20pm PST

Deploy end to end ML pipeline using Apache spark streaming and kubernetes.

Deploy an end to end ML pipeline using Apache spark streaming and kubernetes. Distributed streaming processing engines, like Apache Spark(TM) Structured streaming can help in various ways for performing machine learning in real time at a large scale. A typical streaming machine learning end to end pipeline consists of : # Preprocessing the data based on the application. e.g. normalising or cleaning etc.. # Using micro service and kubernetes hosting the model, using IBM MAX (IBM Model asset exchange). # Scaling the entire pipeline using Apache Spark and kubernetes. This talk may consist of a live demo of applying the above technique, for predicting objects in an image, using an object detection model. Since this is a streaming application, the prediction will be made in realtime. Key takeaways: # Learn about reusing ML models using IBM Model asset exchange. # Learn about how to scale an online ML application end to end, using Apache Spark Structured streaming and kubernetes. Details of associated code and data source used for the demo available here: https://github.com/ScrapCodes/SS-on-kube

Speakers

Prashant Sharma

System Software Engineer, IBM

Open source contributor, part of the CODAIT (Center for Open Source Dataand AI Technologies) group at IBM. Apache Spark committer and PMC member.

Nick Pentreath

Principal Engineer, IBM

Nick Pentreath is a principal engineer in IBM's Center for Open-source Data & AI Technology (CODAIT), where he works on machine learning. Previously, he cofounded Graphflow, a machine learning startup focused on recommendations. He has also worked at Goldman Sachs, Cognitive Match... Read More →

Thursday November 14, 2019 4:20pm - 4:50pm PST
reactive

5:00pm PST

Dagster: a Framework for Data Processing Applications

We introduce Dagster, an open source Python library for building ETL processes, ML pipelines, and similar software systems, all of which we call data applications.

Data applications are graphs of functional computations that consume and produce data assets. Dagster provides abstractions and tools for modeling the semantics of these applications by providing a unified type system, a data dependency graph, a configuration system, a structured API for emitting events such as data quality tests and materializations, and high-quality developer tools built on those abstractions. Computations themselves can be in the tools used by builders -- Spark jobs for data engineers, SQL statements for analysts, Python for data scientists -- and can be deployed to arbitrary orchestration engines -- such as Airflow, Dask, or Kubernetes-based execution.

The result is more reliable, testable, understandable data systems, that leverage the existing tools that work and that are deployable to your infrastructure.

Speakers

Nick Schrock

Founder, Elementl

Nick is the founder/CEO of Elementl and the creator of Dagster (http://dagster.io) the data orchestrator for machine learning, analytics, and ETL. Prior to founding Elementl Nick was a principal engineer and director at Facebook and created GraphQL.

Thursday November 14, 2019 5:00pm - 5:30pm PST
reactive

9:40am PST

The Renaissance for Big Data and Parallelism with GraalVM

The Renaissance suite is a new benchmark suite focused on parallelism and concurrency, and provides workloads that exercise modern parallel programming abstractions and primitives provided by the JVM. Through these workloads, the suite aims to aid in understanding how modern applications and data processing frameworks use the features of the JVM, and to foster development of new optimizations that enable more efficient executions. The GraalVM team has used those benchmarks to improve and assess the performance of its compiler to make it one of the most efficient in the industry.
In this talk, we will discuss about this new suite and how it is helping compiler, GC, VM and tool implementers to fully support and optimize for the kind of workloads developers really care about. We will then dive into the GraalVM use case by detailing what makes GraalVM such a unique ecosystem.

Speakers

Christian Wimmer

Consulting Researcher, Oracle

François Farquet

Senior Researcher, Oracle Labs

Friday November 15, 2019 9:40am - 10:10am PST
reactive

10:20am PST

Change Data Capture in Distributed Systems

Modern systems are usually designed as a collection of cooperating micro-services. These services commonly have their dedicated data stores for their individual needs. To support various requirements corresponding data are often stored in data stores with very different characteristics and use cases. A fundamental requirement emerging from these architectures is the need to reliably capture primary data changes. Change Data Capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. In this talk, I’d like to discuss the advantages and disadvantages of various CDC approaches, provide you guidance in this area and also share our experience including various samples, and recommendations.

Speakers

Petr Zapletal

Tech Lead, Disney Streaming Services

My name is Petr and I work for Disney Streaming Services (ex. Bamtech Media ex. Cake Solutions). I'm interested in Reactive and Distributed Systems, Streaming and ofc Scala and JVM.

Friday November 15, 2019 10:20am - 10:50am PST
reactive

11:00am PST

Delta Lake: Open Source Reliability and Quality for Data Lakes

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Speakers

Michael Paul Armbrust

Tech Lead for Delta Lake, Databricks

Michael Armbrust is a committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and the Delta Lake open source project. He received his PhD from UC Berkeley in 2013, and was... Read More →

Friday November 15, 2019 11:00am - 11:30am PST
reactive

11:40am PST

8 Keys for Successful Serverless Architectures

Serverless promises to make it easier to build and deploy applications but it presents a new set of challenges. These challenges often come at a high cost and make it difficult to use Serverless as a more efficient platform than traditional microservices. These include such things as dealing with cold starts, data, testing and avoiding vendor lock-in. In this talk we will look at the most common challenges and what are the keys to a successful Serverless architecture.

Speakers

Ryan Knight

Principal Software Architect / CEO, Grand Cloud

Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →

Friday November 15, 2019 11:40am - 12:10pm PST
reactive

1:00pm PST

Moonshot Spark: serverless with GraalVM

Can Apache Spark slip its earthly bounds and go serverless, clusterless? Popular cloud services are becoming more capable. AWS Lamba now runs three times longer, Fargate has become less expensive. GraalVM can reduce resource usage while improving cold start times. Consider how to handle small bursts of work. Would a standalone container suit best? If cold startup times weren't such an issue! What about a "mission control" model, where a long-running Spark driver dispatches work to ephemeral executors? What you gain in flexibility and convenience means concessions in performance. Chinning up with GraalVM native image helps. Shuffle is still problematic. Which experimental shuffle manager is best suited to the outer reaches of the cloud? There's not a practical use case for larger workflows - yet. But let's use this moonshot as a lens to magnify cloud performance issues. Explore how these solutions could apply to services you already use.

Speakers

Rose Toomey

Software Engineer, Coatue Management

Big data, Spark, Scala, fintech, ETL pipelines, digital assets, and object allocation. I love a performance mystery.

Friday November 15, 2019 1:00pm - 1:30pm PST
reactive

1:40pm PST

Reliable Machine Learning

Machine learning has been described as "Software 2.0" and holds the promise of totally changing how software systems are constructed. But if manually-written "Software 1.0" code is still plagued with bugs, downtime, and security vulnerabilities, how can we hope to achieve reliable behavior in systems with significant data-dependent machine learning components and their attendant complexities? This talk will survey academic research, industry best practices, and software tools spanning the end-to-end development of machine learning systems from data pipelines to tests and types all the way through to end user experience.

Speakers

David Andrzejewski

Engineering, Sumo Logic

David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →

Friday November 15, 2019 1:40pm - 2:10pm PST
reactive

2:20pm PST

Fast and scalable domain-specific knowledge graphs generation

Very recently, there has been a lot of interest on construction of knowledge graphs. Large companies like Microsoft and Google operate large KBs and there are some open source examples like Yago. However, there are some scenarios where domain specific KBs are needed and Wikipedia data sources may not work. In this talk, I’ll describe techniques to build such type of KBs.

Speakers

Omar Alonso

Tech Lead, Instacart

Omar is a Tech Lead at Instacart where he works on the intersection of information retrieval, knowledge graphs, and human computation.

Friday November 15, 2019 2:20pm - 2:50pm PST
reactive

3:00pm PST

Integrating React with Scala: delivering incremental value in a legacy Play web application

Do you hate JavaScript? Is your UI written in Scala but you can't hire any frontend developers to work on it or backend developers that want to?
Come hear about how my team and I faced these exact problems: a frontend that's difficult to hire for, a monolithic web application with extremely slow build times, a smattering of untested and disorganized UI code.
Through trial and error we developed a way to inject React code into Scala Play, which maintains typesafety and takes advantage of JavaScript libraries to make frontend building faster, hire more, and incrementally work towards a split frontend and backend web application.
For UI business logic, we will discuss how you can use tools like TypeScript, React, and Redux to incrementally improve your web application. Then we will also discuss how tools like webpack and reverse proxying can be used to build the application for production and enable a smooth development experience for your engineers.

Speakers

Niole Nelson

Software Engineer, Domino Data Lab

Inject React into Scala Play pdf

Friday November 15, 2019 3:00pm - 3:30pm PST
reactive

3:40pm PST

Serverless Event-Driven Data Pipeline Platform

Airflow has become the defacto data pipeline platform in many companies.
Airflow was designed to run static, slow-moving workflows on a fixed schedule, and it is a great tool for that purpose. However, users often get into trouble by forcing their use cases to fit into Airflow’s model.
A few examples that Airflow can not satisfy in a first-class way includes:
- Complex DAGs leak application code into Pipeline
- DAGs which need to be run off-schedule or with no schedule at all
- DAGs that run concurrently with the same start time - DAGs with complicated branching logic - DAGs with many fast tasks
- DAGs which rely on the exchange of data - Parametrized DAGs

In this talk we present a brand new Serverless Event-Driven Pipeline Platform written in Scala that addresses all the problems above

Speakers

Rahul Chitturi

Principal Software Engineer, Coatue

Neelabh Gupta

Software Engineer, Coatue Management

Full-stack web development, TypeScript, React, Scala, Python

Friday November 15, 2019 3:40pm - 4:10pm PST
reactive

4:20pm PST

Maximizing Throughput and Scalability for Akka Streams

The Akka Streams API offers a robust, reliable, and expressive means for executing streaming workloads. For applications that demand high throughput, low latency, or increasing scalability, it is critical to understand how to maximize the throughput for Akka Streams. I will examine the structure of Akka Streams and explore techniques for maximizing the throughput of individual streams. I will describe how Akka Streams can be partitioned in order to provide scalability, as well as high-availability. Finally, I will review techniques for profiling and instrumenting Akka Streams to find the bottlenecks. Using the practical techniques from this talk, you will be able to improve the throughput, reliability, and scalability of your streaming applications.

Speakers

Colin Breck

Sr. Staff Software Engineer, Tesla

Colin Breck has experience developing software infrastructures for the near real-time monitoring and control of industrial applications. At Tesla, he works on distributed systems for the monitoring, aggregation, optimization, and control of distributed-energy assets, including solar... Read More →

Friday November 15, 2019 4:20pm - 4:50pm PST
reactive

5:00pm PST

Linkerd and the Service Mesh

In this talk, Charles Pretzer will present an overview of Linkerd, a "service mesh" for Kubernetes, and describe Linkerd's evolution over the years from a a 1.x branch in Scala, Finagle, Netty, and the JVM, to its modern 2.x incarnation in Go and Rust. Charles will cover the service mesh model and how Linkerd implements it, as well as lessons learned over almost four years of production experience at companies around the world.

Speakers

Charles Pretzer

Field Engineer, Buoyant, Inc.

Charles Pretzer is a field engineer at Buoyant, where he spends his time collaborating and engaging with the open source community of the CNCF service mesh, Linkerd. He also enables production level adoption by helping companies integrate Linkerd into their Kubernetes based applications... Read More →

Friday November 15, 2019 5:00pm - 5:30pm PST
reactive