Loading…
This event has ended. Create your own event on Sched.
View analytic
Wednesday, August 31 • 2:00pm - 2:50pm
Fault Tolerance in Mesos - Vinod Kone & Anand Mazumdar, Mesosphere

Sign up or log in to save this to your schedule and see who's attending!

Mesos is a widely used cluster resource manager that has been used in production at scale for some time. At its core, Mesos is a sophisticated distributed system with lots of components and failure scenarios. In datacenters, hardware/network failures are the norm rather than the exception. Being a distributed systems kernel, Mesos needs to be resilient to all of these failures, degrading gracefully wherever applicable. We believe that everyone running a Mesos cluster in production/building distributed systems would benefit from learning how Mesos does fault tolerance to reason better about failures in their cluster/adopt these battle-hardened best practices in their project.

This talk focuses on how various Mesos components like Master, Agent, Scheduler, Executor react to common failures like node failures/process crash, network partitions, Zookeeper failures etc.
This talk will cover the following specific topics:
- How Mesos masters/agents react to node failures and network partitions.
- Agent recovery after a process restart.
- Master health checks
- Executor health checks.
- Scheduler Disconnections/Partitions.
- Zookeeper Failures.

Speakers
avatar for Vinod Kone

Vinod Kone

Distributed Systems Engineer, Mesosphere
Vinod Kone is an Engineer at Mesosphere working on the Mesos core team. He is also a committer and on the PMC for Apache Mesos and Mesos Jenkins plugin. Previously, he was a Tech Lead for the Mesos team at Twitter. In a past life, he did his PhD in UC Santa Barbara researching how to make cars talk to each other. His interests lie in distributed systems and outdoor adventures.
AM

Anand Mazumdar

Distributed System Engineer, Mesosphere
Anand Mazumdar is a software engineer at Mesosphere where he works on the Apache Mesos project. Prior to that, he used to work at a Quantitative Hedge Fund and Amazon Web Services on scalable data stream processing. He holds a Masters in Computer Science from The University of Texas at Austin. On weekends, you can find him playing Poker at a nearby local casino.



Wednesday August 31, 2016 2:00pm - 2:50pm
Ballroom B