Thursday, September 1 • 10:15am - 11:05am
Performing Process Migration with Mesos - Kapil Arya & Isabel Jimenez, Mesosphere

Process migration (also known as process checkpointing) is the ability to move a group of related running processes from one set of nodes to another. It involves three phases: take a snapshot of all processes, copy the snapshot data over to the target nodes, and restart processes from that snapshot. Process migration has traditionally been used for fault-tolerance in the context of long-running stateful applications. Without it, the application developers need to modify the stateful application to periodically save the state to disk in order to restart in case of a failure. This is inefficient and error-prone!

In this talk, we demonstrate process migration within a Mesos cluster for a more enjoyable way to schedule your containers. Apart from fault-tolerance, process migration within Mesos opens up new avenues for implementing better scheduling policies. Some of the other use cases include improved maintenance primitives, debugging, speculative execution and managing “tails” for multi-threaded applications.

Kapil Arya

Kapil Arya is an Engineer at Mesosphere focussing on the core team. He recently finished his PhD at Northeastern University where he was the lead developer of the open source distributed checkpointing project DMTCP, and contributed to the reversible debugger FReD. Interning at VMware for several summers, Kapil contributed to solving the double-paging problem.
Isabel Jimenez

Distributed systems engineer, Mesosphere
Isabel Jimenez is a Software Engineer at Mesosphere.

Thursday September 1, 2016 10:15am - 11:05am
Ballroom C

