I've built and operated a variety of stateful distributed systems, cutting across the consistency-availability, latency-throughput, and price-performance spectra. I get maniacal pleasure from invalidating my assumptions, 303/808/909s and exploring.

projects

Some things I have built or contributed to.

sled

(work in progres) a persistent lock-free b-link tree written in rust, lock-free algorithms verified by TLA+.

minuteman

Fully distributed load balancer written in Erlang for helping services on Mesos communicate seamlessly with each other. Uses a CRDT-based overlay network based on Hyparview for failure detection.

jetpants

A framework for managing very large MySQL systems. We created this at Tumblr to manage O(hundreds of TB) of primary business data, accessible at sub-millisecond latencies :)

void

A terminal-based personal organizer, mind-mapper, task tracker, and time-series visualizer written in Rust.

rasputin

(work in progress) a horizontally scalable linearizable KV/object/log store written in rust, replication algorithm verified by TLA+.

tla-rust

(work in progress) verification of lock-free and distributed algorithms comprising a highly reliable, high performance distributed stateful system

rust-crdt

CRDT (eventual consistency) library for distributed systems. Thoroughly tested using quickcheck.

rust-rocksdb

Rust bindings for RocksDB, a highly configurable LSM embedded database. Currently used in production at several large internet services. My favorite user is TiKV, a horizontally scalable linearizable KV written in Rust, powering the mysql-compatible TiDB horizontally scalable database.

tikv

Horizontally scalable linearizable KV store based on raft, which I've performed fault injection for.

etcd-mesos

A self-healing distributed etcd supervisor, running on Mesos. Built when I was on the Kubernetes team at Mesosphere, allowing us to deploy full Kubernetes clusters on top of Mesos with the click of a mouse.

etcd

In the process of fault injecting systems I've built on top of etcd, I've found several interesting bugs in etcd itself. I've provided minor architectural guidance that was incorporated in etcd v3.

cockroachdb

A monolithic horizontally scalable Postgres-compatible database written in Go with both snapshot and serializable snapshot configurable isolation levels. I was an early contributor, and wrote a high-performance histogram-based metric system for measuring interesting operational statistics, which I later extracted into loghisto.

event gateway

Dataflow for serverless systems, containers, and service. plug your things into each other easily. Make any service event-driven. I built a prototype and implemented the distributed bits of the initial system.

loghisto

A high-performance histogram implementation for understanding the latency tail of a system, either in production or development. Does not rely on sampling methods which break in real systems work. Uses logarithmically bucketed histograms.

open source coin

Aims to be a decentralized github, bolted to a token. I built the first versions of the consensus protocol atop a simulator that rapidly teased out race conditions. Implemented in Haskell.

writing

Fear and Loathing in Lock-Free Programming

An introduction to lock-free programming, with tongue-in-cheek warnings about cognitive complexity traps.

Reliable Systems Series: Model-Based Testing

An introduction to model-based testing, which applies generative testing techniques to complex systems.

Hardening Kubernetes on the DCOS with etcd-mesos

A brief article I wrote describing some of the work that went into etcd-mesos.

talks

RustFest Paris 2018: Building Reliable Infrastructure in Rust

The wild success of testing tools like Jepsen is a wake-up call that we’re approaching systems engineering from a fundamentally bug-prone perspective. Why don’t we find these devastating bugs on our laptops before opening pull requests? Rust’s compiler gives us wonderful guarantees about memory safety, but as soon as we open files or sockets, all hell seems to break loose. This talk will show you how to apply techniques from the distributed systems and database worlds in a way that maximizes the number of bugs found per cpu cycle, and reduce the amount of bias that we hardcode into our tests.