I've built and operated a variety of stateful distributed systems, cutting across the consistency-availability, latency-throughput, and price-performance spectra. I get maniacal pleasure from invalidating my assumptions, 303/808/909s and exploring.


Some things I have built or contributed to.


A framework for managing very large MySQL systems. We created this at Tumblr to manage O(hundreds of TB) of primary business data, accessible at sub-millisecond latencies :)


Fully distributed load balancer written in Erlang for helping services on Mesos communicate seamlessly with each other. Uses a CRDT-based overlay network based on Hyparview for failure detection.


A terminal-based personal organizer, mind-mapper, task tracker, and time-series visualizer written in Rust.


(work in progres) a persistent lock-free b-link tree written in rust, verified by TLA+.


(work in progress) a horizontally scalable linearizable KV/object/log store written in rust, verified by TLA+.


(work in progress) verification of lock-free and distributed algorithms comprising a highly reliable, high performance distributed stateful system


CRDT (eventual consistency) library for distributed systems. Thoroughly tested using quickcheck.


Rust bindings for RocksDB, a highly configurable LSM embedded database. Currently used in production at several large internet services. My favorite user is TiKV, a horizontally scalable linearizable KV written in Rust, powering the mysql-compatible TiDB horizontally scalable database.


Horizontally scalable linearizable KV store based on raft, which I've performed fault injection for.


A self-healing distributed etcd supervisor, running on Mesos. Built when I was on the Kubernetes team at Mesosphere, allowing us to deploy full Kubernetes clusters on top of Mesos with the click of a mouse.


In the process of fault injecting systems I've built on top of etcd, I've found several interesting bugs in etcd itself. I've provided minor architectural guidance that was incorporated in etcd v3.


A monolithic horizontally scalable Postgres-compatible database written in Go with both snapshot and serializable snapshot configurable isolation levels. I was an early contributor, and wrote a high-performance histogram-based metric system for measuring interesting operational statistics, which I later extracted into loghisto.

event gateway

Dataflow for serverless systems, containers, and service. plug your things into each other easily. Make any service event-driven. I built a prototype and implemented the distributed bits of the initial system.


A high-performance histogram implementation for understanding the latency tail of a system, either in production or development. Does not rely on sampling methods which break in real systems work. Uses logarithmically bucketed histograms.


Hardening Kubernetes on the DCOS with etcd-mesos

A brief article I wrote describing some of the work that went into etcd-mesos.