This event has ended. Create your own event on Sched.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Developer [clear filter]
Wednesday, August 31


Yammer Datacenter Story - OnPremise, GFS, Azure - Tobias Haag & Kyle Gordon, Microsoft

All things Docker seem to be the defining slogan for Yammer these days. And what other way to run Docker at scale than Mesos?
Yammer wrote an in-house datacenter automation platform which was fraught with problems and restrictions the Mesos landscape promised to solve, and in 2015 engineers at Yammer started exploring it as a possible replacement.
This talk will focus on two core points of the implementation run at Yammer: Docker & Security

The big gotcha: Docker. Everything has to run inside Docker. From Zookeeper to the Mesos agents and even the executors. Dedication, mixed with some despair, eventually resulted in the platform for production today.

Security is the second biggest piece to honor the commitment made to Yammer customers as part of the Office365 suite. The path to provide a good solution to security aspects like compartmentalization, trust bootstrap, access control and transport encryption proved a balancing act between customer commitment and developer friendliness.

In the end Mesos helped us deliver Docker's promise of Build, Ship, Run as the foundation of environments Yammer can trust to run from Dev all the way to Production at scale and anywhere on the planet.

avatar for Kyle Gordon

Kyle Gordon

Software Engineer 2, Yammer / Microsoft
I have been an engineer on the operations team for Yammer for almost 3 years. Currently I am in the process of creating a stable environment to migrate Yammer to Azure. We are primarily a micro service company and are an ideal fit to live on a mesos architectured environment. We’ve... Read More →
avatar for Tobias Haag

Tobias Haag

Senior Software Engineer, Microsoft
4 years at Yammer from Junior to Senior Software Engineer at Yammer / Microsoft. Since late 2013 part of the Infrastructure team with focus on Datacenter automation and software lifecycle support. As of mid 2015 leading developer on Azure migration effort with focus on Docker and... Read More →

Wednesday August 31, 2016 10:30am - 11:20am
Ballroom B


Marathon and Chronos - State of the Art - Matthias Eichstedt & Matthias Veit, Mesosphere
Marathon and Chronos are the core and essential frameworks for most Apache Mesos clusters.
Marathon ensures long running tasks are running in the datacenter and is commonly used to maintain high availability of other frameworks. Chronos is the fault tolerant cron of the datacenter.
There have been significant advances in both of these frameworks to leverage the latest features of Apache Mesos including persistent storage in addition to extending the capabilities of the system such as networking and security.
This session will provide an overview of all the new features along with experience from the field and best practices for using these features in production. The session will conclude with a look at the roadmap for each of these projects providing a glimpse into what to expect over the next year.

avatar for Matthias Eichstedt

Matthias Eichstedt

Engineering Manager, Mesosphere
I'm Engineering Manager of the Orchestration team at Mesosphere, based in Hamburg, Germany.
avatar for Matthias Veit

Matthias Veit

Mesosphere, Engineering Manager

Wednesday August 31, 2016 11:30am - 12:20pm
Ballroom B


Fault Tolerance in Mesos - Vinod Kone & Anand Mazumdar, Mesosphere
Mesos is a widely used cluster resource manager that has been used in production at scale for some time. At its core, Mesos is a sophisticated distributed system with lots of components and failure scenarios. In datacenters, hardware/network failures are the norm rather than the exception. Being a distributed systems kernel, Mesos needs to be resilient to all of these failures, degrading gracefully wherever applicable. We believe that everyone running a Mesos cluster in production/building distributed systems would benefit from learning how Mesos does fault tolerance to reason better about failures in their cluster/adopt these battle-hardened best practices in their project.

This talk focuses on how various Mesos components like Master, Agent, Scheduler, Executor react to common failures like node failures/process crash, network partitions, Zookeeper failures etc.
This talk will cover the following specific topics:
- How Mesos masters/agents react to node failures and network partitions.
- Agent recovery after a process restart.
- Master health checks
- Executor health checks.
- Scheduler Disconnections/Partitions.
- Zookeeper Failures.

avatar for Vinod Kone

Vinod Kone

Apache Mesos PMC, Mesosphere
Vinod Kone is a committer and PMC member of the Apache Mesos project. He is currently a Tech Lead and Engineering Manager @ Mesosphere. Previously, he was a Tech Lead and Manager of the Mesos team @Twitter. Vinod completed his PhD in Computer Science from UC Santa Barbara.

Anand Mazumdar

Distributed System Engineer, Mesosphere
Anand Mazumdar is a software engineer at Mesosphere where he works on the Apache Mesos project. Prior to that, he used to work at a Quantitative Hedge Fund and Amazon Web Services on scalable data stream processing. He holds a Masters in Computer Science from The University of Texas... Read More →

Wednesday August 31, 2016 2:00pm - 2:50pm
Ballroom B


JEE on DC/OS 101 and Fun - Josef Adersberger, Qaware
Cloud native applications are popular these days – applications that run in the cloud reliably und scale almost arbitrarily. They follow three key principles: They are built and composed as microservices, they are packaged and distributed in containers and the containers are executed dynamically in the cloud. In this hands-on session we will show how to build, package and deploy cloud native Java EE applications on top of DC/OS - fully automated with Gradle. And for the fun of it we will be using an off-the-shelf DJ pad, programmed with nothing else than the Java Sound API, to demonstrate the core concepts and to visualize and remote control DC/OS.

avatar for Josef Adersberger

Josef Adersberger

CTO, QAware
Josef Adersberger is co-founder & CTO of QAware, a German custom software development company and CNCF silver member. He studied computer science in Rosenheim and Munich and holds a doctoral degree in software engineering. He is currently responsible for a large-scale cloud migration... Read More →

Wednesday August 31, 2016 3:00pm - 3:50pm
Ballroom B


DC/OS: Considerations and Best Practices for Production Deployments - Ben Lin, Mesosphere
DC/OS makes it easy to quickly setup and run a Mesos cluster but the default configuration may not be as battle-hardened as one would like or uniquely tailored to meet your requirements and constraints. In this session, Mesosphere consultants will discuss key customer requirements and the architecture decisions that lead to a production ready, resilient, and scalable DC/OS design. This session will cover Capacity Planning, Monitoring, Physical Host Considerations, Security, Container Data Management, and Backup and Recovery. The session will also cover the unique elements that need to be considered when deploying DC/OS in Production vs. Dev/Test configurations.


Ben Lin

Solutions Architect, Mesosphere
Ben Lin is a Solutions Architect at Mesosphere. Ben works closely with customers to develop strategies for architecting, deploying, and operationalizing DC/OS environments. Previously he worked in the VMware NSBU (Networking & Security Business Unit), focusing on advanced solutions... Read More →

Wednesday August 31, 2016 4:20pm - 5:10pm
Ballroom B


Building Highly Available Mesos Frameworks, 2.0 - Neil Conway, Mesosphere
Production-quality Mesos frameworks must be able to continue managing tasks despite unreliable networks and faulty computers. Mesos provides tools to help developers do fault-tolerant task management, but putting these tools together effectively remains something of a black art. This talk will offer practical guidance to framework developers to help them understand how Mesos deals with failures and the tools it provides to enable fault tolerant frameworks. The talk will also cover new Mesos features that allow framework developers to control how partitioned tasks should be handled. Mesos operators will also benefit from a discussion of exactly how Mesos behaves during network partitions and other failure scenarios.


Neil Conway

Neil Conway is an engineer at Mesosphere, and an Apache Mesos Committer and PMC Member. At Mesosphere, he has worked on a variety of projects, including partition-aware Mesos frameworks and improving support for stateful Mesos frameworks. Prior to joining Mesosphere, he completed... Read More →

Wednesday August 31, 2016 5:20pm - 6:10pm
Ballroom A


Taming Apache Mesos: Tools to Make Scalable Software Easier, Quicker and More Resilient - Pini Reznik, Container Solutions
Cloud-based infrastructure is providing businesses with enhanced resiliency, improved performance and reduced costs. Development against such infrastructure, however, has become turgid and fragile. Even with the revolution of “infrastructure as code”, we still see hardcoded IP Addresses, sleeps and randomly failing unit tests. Test’s themselves are bloated with boilerplate connection code and massive delays as test infrastructure is provisioned. The rapid feedback loops promoted in the previous decade are fast becoming impossible to achieve.

The presentation will introduce the audience to Apache Mesos and several Mesos-related projects that we hope solve some of these problems. Minimesos is an in-memory development environment to allow engineers to develop scalable applications on their own machine. Mesos-Starter is a Spring component that enables users to create a Mesos framework in a matter of minutes. And our bleeding edge project, Mesos-Monkey, provides a tool to harden, test and verify an application's resiliency to failure.

The audience will also be able to try online versions of MiniMesos throughout and after the presentation.

avatar for Pini Reznik

Pini Reznik

CTO, Container Solutions
Pini Reznik is a Co-founder and CTO of Container Solutions, Pini oversees the architecture and implementation of Cloud Native migrations for organisations from every sector. His work spans two decades in the configuration management field, with current emphasis on DevOps, automation... Read More →

Wednesday August 31, 2016 5:20pm - 6:10pm
Ballroom B
Thursday, September 1


Ambrose: Automatic Metric Collection and Correlation in DC/OS - Nick Parker, Mesosphere
DC/OS provides easy deployment and management of clustered applications. In both production and test environments, a deep understanding of the state of those applications is equally if not more important. This talk describes a new suite of components for collecting and tagging metrics in DC/OS systems, both system-level information at the agent nodes, as well as arbitrary metrics from individual applications. Applications are presented with a simple interface for passing metrics upstream, and Ambrose does the rest, tagging all data with its origin as its processed.

avatar for Nicholas Parker

Nicholas Parker


Thursday September 1, 2016 10:15am - 11:05am
Ballroom B


Designing and Evaluating a Distributed Computing Language Runtime - Christopher Meiklejohn, Université catholique de Louvain
Consistency is hard and coordination is expensive. As we move into the world of connected 'Internet of Things' style applications, or large-scale mobile applications, devices have less power, periods of limited connectivity, and operate over unreliable asynchronous networks. This poses a problem with shared state: how do we handle concurrent operations over shared state, while clients are offline, and ensure that values converge to a desirable result without making the system unavailable?

We look at a new programming model, called Lasp. This programming model combines distributed convergent data structures with a dataflow execution model designed for distribution over large-scale applications. This model supports arbitrary placement of processing
node: this enables the user to author applications that can be distributed across data centers and pushed to the edge.

In this talk, we will focus on the design and evaluation of the Lasp runtime system: a system written in Erlang with a target scale of 10,000 - 20,000 nodes. We will look at the supporting algorithms that assist in achieving this scale, and how we’ve leveraged Mesos and DC/OS to build a system for performing experiments and identifying bottlenecks as we develop the language.

avatar for Christopher Meiklejohn

Christopher Meiklejohn

Ph.D. candidate, Université catholique de Louvain

Thursday September 1, 2016 11:15am - 12:05pm
Ballroom B


One is not enough - Using Multiple Disk Support - Joris Van Remoortere, Jörg Schad & Felix Hupfeld, Mesosphere
Multiple Disk (MESOS-191) is an age old feature request that was implemented at last year's MesosCon EU hackathon. It allows operators to expose multiple disk resources. This enables frameworks to a) utilize more than disk per agent and b) gain exclusive access to a complete disk device (including disk errors when they attempt to exceed the capacity of the volume instead of having Mesos kill the Task when exceeding its disk resource limit).
This talk will give an overview of the different types of disk resources now available (Root, Path, and Mount Disk) and guidelines on when to use which. Secondly we will discuss how multiple disk resources can be utilized by users who are running their workloads using Marathon. Last we will provide best practices for framework developer who are looking for utilizing this feature.

This will be done on the example of the Mesos Quobyte framework which was one of the first frameworks to take advantage of multiple disk support.


Felix Hupfeld

Founder, Quobyte
Felix manages Quobyte’s technology and pushes development forward. Before that, he worked as a tech lead and capacity planner in Google’s infrastructure team (2009–2013). He was the architect and project manager for XtreemFS (2006–2009). Felix’s PhD was on distributed s... Read More →

Joris Van Remoortere

Joris Van Remoortere is a Senior Software Engineer at Mesosphere. He focuses on large-scale, low-latency, high-throughput systems. He alleviates scaling issues and makes performance and reliability improvements on many projects, including Orly, messaging and payment systems, web tiers... Read More →
avatar for Jörg Schad

Jörg Schad

Head of Machine Learning, ArangoDB
Jörg Schad is Head of Machine Learning at ArangoDB. In a previous life, he worked on built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases, conducted research in the Hadoop and Cloud area. He’s a frequent speaker at meetups... Read More →

Thursday September 1, 2016 1:30pm - 2:20pm
Ballroom B


Practical, Resilient Software Defined Networking: A Field Report - Sargun Dhillon & Avinash Sridharan, Mesosphere
Mesos, as it comes out of the box, has some limitations in its network infrastructure. The mechanism out of the box does not work with legacy applications that expect stable ports, and IPs. We built a general purpose, solution using off-the-shelf components to provide overlay networking for Mesos clusters. This overlay network is meant to become the default way of doing networking in Mesosphere's DC/OS. In this talk, Avinash, Jie, and Sargun will talk about the design for the Mesos modules, isolators, and integration into Docker. This mechanism sits along side an external distributed system to orchestrate it, and glue it all together. This system is meant to be a flexible, highly available system that is meant to run in the stable state without any masters. We'll talk about why we decided to go with this design and how it works as deployed in the real DC/OS installations.

avatar for Avinash Sridharan

Avinash Sridharan

Software Engineer, Mesosphere
Love containers, especially networking them !! Have worked (and studied) in the field of networking for the past decade or so. Very interested in designing and developing new network technologies. Want everything to be fast, distributed, and extremely efficient.

Thursday September 1, 2016 2:30pm - 3:20pm
Ballroom B


Supporting GPUs in Docker Containers on Apache Mesos - Kevin Klues, Mesosphere & Yubo Li, IBM
Gaining access to GPUs inside a Docker container is a non-trivial task. Indeed, Nvidia provides both a Docker volume-plugin, as well as a full blown wrapper around Docker itself in order to allow seamless access to GPUs inside a running Docker container. With the advent of the new unified containerizer in Mesos (which runs docker containers natively in Mesos) as well as continued support for the Docker containerizer (which shells out the task of launching docker containers to Docker itself), Mesos needs a way to to seamlessly support GPUs inside running docker containers as well.

This talk covers the details of why gaining access to GPUs inside a running docker container is hard, how Nvidia has solved this problem for standalone docker itself, and how Mesos has now solved this problem in both its unified containerizer and Docker containerizer technologies together.

avatar for Kevin Klues

Kevin Klues

Tech Lead Manager, Mesosphere
Kevin Klues is a Tech Lead Manager at Mesosphere running the DC/OS ClusterOps team. Since joining Mesosphere, Kevin has been involved in the design and implementation of a number of Mesos’s core subsystems, including GPU isolation, Pods, the Mesos CLI and Attach/Exec support. He... Read More →
avatar for Yubo Li

Yubo Li

Research Stuff Member, IBM Research
Dr. Yubo Li is from IBM Research China, Beijing. He is working with GPU enablement on Kubernetes/Mesos/OpenStack cloud and architecture design for deep learning (DL). He is the chief architect for GPU acceleration service on SuperVessel, an open-access cloud with OpenStack and OpenPOWER... Read More →

Thursday September 1, 2016 4:00pm - 4:50pm
Ballroom B