Bay Area Apache Spark Meetup 8 22 17 HD
Tech-Talk 1: Large-scale batch processing at Pinterest with Apache Spark Abstract: Pinterest is a data product and we rely heavily on processing a large amount of data for various use cases ranging from discovery products to business metric computation. Spark has been present at Pinterest since 2014, but it was only last year when it started to attract large scale use cases and the use cases are ever growing since. We are going talk about Pinterest’s journey on Spark so far and the technical challenges we have faced running a large scale data infrastructure in the cloud. One of many use cases for the large-scale batch processing at Pinterest is our experiment framework. At Pinterest, we rely heavily on A/B experiments to make decisions about products and features. Every day we aim to have experiment results ready by 10 a.m. so we can make fast and well-grounded decisions. With more than 1,000 experiments running daily, crunching billions of records for more than 175 million Pinners, we need a reliable pipeline to support our growth and achieve our service-level agreement. We will discuss how we built the experiment framework to speed up the computational processes, make it more scalable and performant. Tech-Talk 2: Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark Abstract: Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner. In this talk, we’ll survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies Deep Learning in three major ways: • It has a simple API that integrates well with enterprise Machine Learning pipelines. • It automatically scales out common Deep Learning patterns, thanks to Spark. • It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL. In this talk, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show: • how to build deep learning models in a few lines of code; • how to scale common tasks like transfer learning and prediction; and • how to publish models in Spark SQL.
Похожие видео
Показать еще