Loading…
This event has ended. Create your own event on Sched.
Back To Schedule
Wednesday, August 31 • 2:00pm - 2:50pm
Fault Tolerance in Mesos - Vinod Kone & Anand Mazumdar, Mesosphere

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Mesos is a widely used cluster resource manager that has been used in production at scale for some time. At its core, Mesos is a sophisticated distributed system with lots of components and failure scenarios. In datacenters, hardware/network failures are the norm rather than the exception. Being a distributed systems kernel, Mesos needs to be resilient to all of these failures, degrading gracefully wherever applicable. We believe that everyone running a Mesos cluster in production/building distributed systems would benefit from learning how Mesos does fault tolerance to reason better about failures in their cluster/adopt these battle-hardened best practices in their project.

This talk focuses on how various Mesos components like Master, Agent, Scheduler, Executor react to common failures like node failures/process crash, network partitions, Zookeeper failures etc.
This talk will cover the following specific topics:
- How Mesos masters/agents react to node failures and network partitions.
- Agent recovery after a process restart.
- Master health checks
- Executor health checks.
- Scheduler Disconnections/Partitions.
- Zookeeper Failures.

Speakers
avatar for Vinod Kone

Vinod Kone

Apache Mesos PMC, Mesosphere
Vinod Kone is a committer and PMC member of the Apache Mesos project. He is currently a Tech Lead and Engineering Manager @ Mesosphere. Previously, he was a Tech Lead and Manager of the Mesos team @Twitter. Vinod completed his PhD in Computer Science from UC Santa Barbara.
AM

Anand Mazumdar

Distributed System Engineer, Mesosphere
Anand Mazumdar is a software engineer at Mesosphere where he works on the Apache Mesos project. Prior to that, he used to work at a Quantitative Hedge Fund and Amazon Web Services on scalable data stream processing. He holds a Masters in Computer Science from The University of Texas... Read More →



Wednesday August 31, 2016 2:00pm - 2:50pm PDT
Ballroom B