Writings

February 02, 2026 27 min read Interactive

From Problem to Taxonomy

Content-Defined Chunking, Part 1

An introduction to content-defined chunking: why fixed-size splitting fails, how content-aware boundaries solve the deduplication problem, and a taxonomy of three CDC algorithm families.

February 09, 2026 20 min read Interactive

A Deep Dive into FastCDC

Content-Defined Chunking, Part 2

An exploration of FastCDC's Gear hash, normalized chunking with dual masks, and the 2020 two-byte-per-iteration optimization, with code in pseudocode, Rust, and TypeScript.

February 16, 2026 19 min read Interactive

Deduplication in Action

Content-Defined Chunking, Part 3

See CDC-based deduplication in action, learn where CDC is deployed today, and explore the frontier of structure-aware chunking for source code.

February 23, 2026 20 min read Interactive

CDC in the Cloud

Content-Defined Chunking, Part 4

CDC chunks are the right logical unit for deduplication, but storing them as individual objects is prohibitively expensive. This post explores containers, the storage abstraction that makes CDC viable at scale, and the fragmentation, garbage collection, and restore challenges they introduce.

February 26, 2026 13 min read Interactive

CDC at Scale on a Budget

Content-Defined Chunking, Part 5

Cloud object storage can be expensive for CDC at scale. This post explores cost-saving alternatives: challenger storage providers with radically different pricing, and the role caching plays under Zipf access patterns to drive costs down further.