This event has ended. Create your own event on Sched.

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Operator [clear filter]
Wednesday, August 31


Mesos 1.0 - Vinod Kone & Anand Mazumdar, Mesosphere
Mesos will soon reach the 1.0 milestone. In addition to new features, this would mean a more stable user-facing API and stricter support/release guarantees for operators/framework developers. The aim of this talk is to apprise the operators/framework developers/users about the new API and also discuss the support/compatibility guarantees offered by Mesos going forward.

This talk is a sequel to the last year’s MesosCon Seattle talk on “Mesos HTTP API” and continues from where it left off.

This talk will cover the following specific topics:
- Discuss the newly introduced Operator APl.
- Update on the recent improvements to the Framework API.
- Update on client libraries for the new Framework API.
- Release cadence for Mesos going forward.
- Support/Compatibility guarantees for operators/framework developers e.g., backporting of patches etc.
- Master->Agent renaming in the 1.0 API.

avatar for Vinod Kone

Vinod Kone

Apache Mesos PMC, Mesosphere
Vinod Kone is a committer and PMC member of the Apache Mesos project. He is currently a Tech Lead and Engineering Manager @ Mesosphere. Previously, he was a Tech Lead and Manager of the Mesos team @Twitter. Vinod completed his PhD in Computer Science from UC Santa Barbara.

Anand Mazumdar

Distributed System Engineer, Mesosphere
Anand Mazumdar is a software engineer at Mesosphere where he works on the Apache Mesos project. Prior to that, he used to work at a Quantitative Hedge Fund and Amazon Web Services on scalable data stream processing. He holds a Masters in Computer Science from The University of Texas... Read More →

Wednesday August 31, 2016 10:30am - 11:20am
Ballroom C


The Evolution of Deploy Tooling at Twitter - David McLaughlin, Twitter
Apache Mesos and Apache Aurora have been crucial in growing a microservices culture at Twitter. The initial phase of adoption at Aurora at Twitter saw many teams abandon their own deploy tooling and frameworks in favor of concise Aurora configuration files.

But over time a whole new class of bespoke deploy tooling has emerged as service owners had to target large deployment matrices consisting of multiple pre-production environments (for things like performance testing, integration testing, regression testing, canary testing, etc.) across multiple availability zones (i.e. multiple Mesos clusters). As the matrix grows, manual orchestration quickly becomes untenable.

The custom tooling also had a strong emphasis on supporting rolling back services to a previous good state, which only becomes more complicated as your deployment matrix grows.

In this talk I’ll outline the ways in which Mesos and Aurora helped our engineers to implement more sophisticated DevOps processes by making it easy to grow their deployment matrix. I’ll outline how that process surfaced some of the holes in currently available tooling and how it led to a huge amount of duplicate effort for our service owners. Finally I’ll describe a system we’ve built at Twitter to support our CI/CD pipeline that we feel closes those gaps.


Wednesday August 31, 2016 11:30am - 12:20pm
Ballroom C


The Mushroom Cloud Effect or What Happens When Containers Fail? - Alois Mayr, Dynatrace
Micro service architectures result in up to 20 times larger environments than their monolithic counterparts. In such big and interconnected environments container metrics will tell you about infrastructure health but not service health. Even if you have implemented service health checks to quickly react on service failures, in a resilient system (like built on top of Mesos/Marathon or DC/OS) you will see intermediary mushroom cloud effects of a large number of services being affected temporarily. The mushroom cloud shows you all services, containers and hosts being affected by a failing component. How do you find out what really caused the problem and how to distinguish effect vs. cause?

In this session Alois will do post-mortem analysis by walking through different cases of failures we've observed in a real-world large e-commerce production environment running on Apache Mesos and show you how to figure out what actually caused the failures.

avatar for Alois Mayr

Alois Mayr

Technology Lead Cloud & Containers, Dynatrace
Alois is technology lead for Cloud Foundry at Dynatrace and working on bringing full-stack monitoring to cloud native platforms.

Wednesday August 31, 2016 2:00pm - 2:50pm
Ballroom C


Flexibility Across the Cloud - Managing and Scaling your High Availability DC/OS Cluster using Juju - Thomas Barber, Spicule Ltd
DC/OS, the recently open sourced Mesos based operating system allows system administrators and devops departments to run entire data centers as a single compute unit. But what about managing your servers and scaling your infrastructure?

With the advent of cloud computing and vastly reduced infrastructure costs, the compute resource available to businesses is virtually limitless and Juju, created by Canonical, allows us to manage our applications flexibility and across bothcloud and physical infrastructure as if it were the same thing. In this presentation Tom Barber will take you into the new world of application modelling.

Already used to drive many Open Stack deployments we will take a look at how Juju can help you model your infrastructure in a way that vastly simplifies managing your DC/OS or Mesos installation and networking, whilst avoiding single cloud lock in or managing various services across different vendors and APIs.

We'll also delve into testing failure, enterprise monitoring and logging of your cloud agnostic DC/OS stack.

avatar for Tom Barber

Tom Barber

Technical Director, Spicule LTD
Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →

Wednesday August 31, 2016 3:00pm - 3:50pm
Ballroom C


Bringing JFrog Artifactory to Apache Mesos - Alexis Tual, Jfrog
A CI/CD pipeline running on Mesos needs a dedicated component to store, serve the artefacts and their metadata. It is now possible to run JFrog's Artifactory, a universal binary repository manager, in a highly available configuration on Apache Mesos that can support many hundreds of users. Once usage of Artifactory scales beyond the demands of a single node, a highly available configuration uses a primary/secondary architecture to scale to several nodes. These nodes also require access to a relational database and shared filesystem.

In this presentation, Alexis Tual, a Solution Engineer at JFrog, will review the challenges faced when adapting the existing highly available architecture of JFrog to the world of Mesos: from storage to scheduling. He will also show how you can use Artifactory to push and pull Docker images with Marathon to create CI/CD pipelines for containerized projects. He will also discuss future work for both Mesos and Artifactory that will improve this integration.

avatar for Alexis Tual

Alexis Tual

Solution Engineer, JFrog
Alexis is a versatile Solution Engineer working at JFrog in Toulouse (France). He has a strong Java (Groovy !) / Web Developer background and in the past few years dedicated his time to automation and CI/CD.

Wednesday August 31, 2016 4:20pm - 5:10pm
Ballroom C


It’s Complicated, Okay (or Let’s Talk Openly about Mesos’ OSS Neighbors, Friends, and Rivals) - Aaron Williams, Mesosphere
Mesos is never the only OSS you need to run your production datacenter. And just like all of us hanging out together at MesosCon, if you surround Mesos with its OSS friends, you get a happy, highly productive Mesos. But you have to be careful, not all OSS plays well with others.

In this talk, we’ll start by looking at a handful of production Mesos datacenters from major users. We’ll use these real-world examples to abstract a standard Mesos datacenter architecture that contains all of the components needed to run today’s modern, containerized apps with big data and analytics frameworks.

With that harmonized Mesos datacenter architecture, we’ll look at each of the abstract components and discuss the leading OSS projects that fit each piece. We’ll discuss the characteristics of what makes some solutions work well with Mesos, and call out the projects that don’t meet the standard.

We’ll wrap up the talk by showing the complete architecture diagram, and show the single- command trick for bringing all of the best OSS components together and getting them up and running in a production cluster.

avatar for Aaron Williams

Aaron Williams

Engineering Leader, Mesosphere
Accomplished engineering manager with a passion and drive for building and scaling infrastructure, and utilizing data to solve complex issues. Strong believer in collaborative teamwork -- the sum is greater than its parts. Demonstrated track record of directing fast-paced, high-performing... Read More →

Wednesday August 31, 2016 5:20pm - 6:10pm
Ballroom C
Thursday, September 1


Performing Process Migration with Mesos - Kapil Arya & Isabel Jimenez, Mesosphere
Process migration (also known as process checkpointing) is the ability to move a group of related running processes from one set of nodes to another. It involves three phases: take a snapshot of all processes, copy the snapshot data over to the target nodes, and restart processes from that snapshot. Process migration has traditionally been used for fault-tolerance in the context of long-running stateful applications. Without it, the application developers need to modify the stateful application to periodically save the state to disk in order to restart in case of a failure. This is inefficient and error-prone!

In this talk, we demonstrate process migration within a Mesos cluster for a more enjoyable way to schedule your containers. Apart from fault-tolerance, process migration within Mesos opens up new avenues for implementing better scheduling policies. Some of the other use cases include improved maintenance primitives, debugging, speculative execution and managing “tails” for multi-threaded applications.

avatar for Kapil Arya

Kapil Arya

Kapil Arya is an Engineer at Mesosphere focussing on the core team. He recently finished his PhD at Northeastern University where he was the lead developer of the open source distributed checkpointing project DMTCP, and contributed to the reversible debugger FReD. Interning at VMware... Read More →
avatar for Isabel Jimenez

Isabel Jimenez

Distributed systems engineer, Mesosphere
Isabel Jimenez is a Software Engineer at Mesosphere.

Thursday September 1, 2016 10:15am - 11:05am
Ballroom C


Docker At Scale - Pitfalls And How To Debug Them - Jan Schlicht, Mesosphere
Using Mesos to run Docker containers at scale is a common practice for many users.
In this talk we will give an overview of the different ways on how to run docker on top of Mesos including discussing the differences between running docker with different (i.e., Mesos, Docker, or universal) containerizers.

As running Docker at large scale offers its own challenges (e.g., how to start up 1000 containers as quickly as possible), we present best practices and common pitfalls we encountered over the last years. We also discuss approaches for debugging Docker related problems.

avatar for Jan Schlicht

Jan Schlicht

Software Engineer, Mesosphere
Jan Schlicht is a Software Engineer at Mesosphere, working on storage features. His work included adding CSI support for Apache Mesos.

Thursday September 1, 2016 11:15am - 12:05pm
Ballroom C


Your Containers did WHAT??? Securing and Monitoring Containerized Apps - Alessandro Gallotta, Sysdig
The next generation distributed data center architecture is making applications more powerful and more responsive. But as many teams are starting to find out, the complexity of securing these applications and monitoring their behavior can be impractical, painful, and sometimes plain impossible.

In this demo-driven presentation, Luca Marturana will take you through the underlying challenges of container operations, cover the current state of the art of container and microservice monitoring, and discuss new techniques such as behavioral monitoring to secure your infrastructure. Using open source tools running in live environments, he will demonstrate how to effectively monitor, troubleshoot, and secure Mesos deployments.

The presentation will feature live interaction with container environments and live demos of all tools and techniques discussed. Special emphasis will be put on using the Mesos portfolio of scheduling and management tools as well as sysdig, an open source container and system troubleshooting tool developed by the presenter, and the open source behavioral security monitor falco.

GitHub link: https://github.com/draios/sysdig

Specific topics will include:
* visualizing the physical vs logical architecture of Mesos & DCOS deployments
* monitoring performance at the holistic microservice/application level for orchestrated systems
* Leveraging Mesos metadata such as Master, Slave, Marathon and labels for more intelligent troubleshooting
* identifying and surfacing anomalous system activity of individual Docker containers

avatar for Alessandro Gallotta

Alessandro Gallotta

Software Engineer, Sysdig
Alessandro Gallotta is a software engineer at Sysdig. He is a core developer where he focuses on backend services dealing with big data and high availability issues.  He holds a M.Sc. in Computer Engineering from University of Catania, Italy.  Prior to Sysdig he worked as web developer... Read More →

Thursday September 1, 2016 1:30pm - 2:20pm
Ballroom C


Securing Marathon Installation - Gaston Kleiman, Mesosphere
Walk through Marathon’s architecture.
Describe how to secure the communication with Mesos
Describe the authentication/authorization plugin interface
Show an example plugin

avatar for Gastón Kleiman

Gastón Kleiman

Staff Software Engineer, Mesosphere
Gastón Kleiman, Apache Mesos PMC/Committer, is a Staff Software Engineer at Mesosphere. He fell in love with distributed systems and infrastructure automation while contracting for Google, where he got to use Borg, MapReduce and other cool technology. That led him to work at Amazon... Read More →

Thursday September 1, 2016 2:30pm - 3:20pm
Ballroom C


Continuous Delivery without Stress: Open Source Canary-Releasing with DC/OS and VAMP - Olaf Molenveld & Dragoslav Pavkovic, Magnetic.io/VAMP
DC/OS is a powerful platform to run containers and resilient microservices architectures at scale. But releasing or upgrading software to production often is a stressful moment due to the risk of performance issues or even downtime. Applying canary-patterns to ContinuousDelivery pipelines provides a safety-net which makes releasing containers less risky and stressful. By publishing new software versions to only a small percentage of visitors with specific criteria, it enables you to test, optimise and scale in a controlled and gradual way, without negatively impacting the majority of users. In this presentation we’re going to talk about how VAMP adds powerful opensource canary-releasing features to the DC/OS stack, and how to setup a smart ContinuousDelivery pipeline.

avatar for Olaf Molenveld

Olaf Molenveld

CTO/founder, Magnetic.io/Vamp
Founder and CTO of Vamp.io (formerly Magnetic.io), builders of Vamp. Vamp is a modern cloudnative solution to continuously release new micro services into production without downtime using advanced AI based Canary testing and releasing features, and delivering smart right-scaling... Read More →

Thursday September 1, 2016 4:00pm - 5:00pm
Ballroom C