Bloom Filter Lab · Educational Microsite

Probabilistic membership, visualized and hands-on

Explore classic, counting, and partitioned Bloom filters with a runnable CLI demo. Learn the mechanics, tradeoffs, and roadmap while keeping artifacts portable via the evergreen contract.

Slug: bloom Repo: stainlessray/bloom Host: stainlessray.com/public/site Version: v0.1.1 demo

What is a Bloom filter?

The Bloom filter is a probabilistic checklist that answers “have I seen this item?” with minimal memory.

Core mechanics
  • Bit array initialized to 0.
  • Multiple hash functions map an item to k bit positions.
  • Add: set those bits to 1. Check: if any mapped bit is 0, item is absent.
Literal view
{ apple, banana, cherry }

bits = 0 0 0 0 0 0 0 0 0 0
apple  -> set 2,4,8
banana -> set 1,4,7
cherry -> set 3,6,9

bits = 0 1 1 1 1 0 1 1 1 1

Variants and tradeoffs

VariantWhy use itTradeoff
ClassicFast membership checks for static setsNo deletions; saturates over time
CountingSupports deletion via countersUses more memory
PartitionedDistributes load across partitionsSlightly higher compute overhead
ScalableExpands with new filters as load growsMore complex storage/management
Cuckoo filterLow FPR with deletions baked inMore complex insertion logic

Learn by doing

Launch the interactive demo
mvn clean package
java -cp target/classes com.bloomfilter.demo.InteractiveBloomDemo

Switch modes: mode classic | mode counting | mode partitioned

Standardized ingestion
mode classic
ingestlist src/main/resources/data/fruit.txt filters/
loadstd filters/fruit_Classic_m64_k3_vYYYYMMDDHHMMSS.bin

Creates a portable .bin with metadata (algorithm, bits, hashes, source, timestamp).

Sample session
mode counting
ingestlist src/main/resources/data/fruit.txt filters/
check cherry
remove cherry
save filters/fruit_Counting_m64_k3.bin

Educational path

Concepts covered so far:

Bit-level tracking Hash multi-mapping False positives Deletion via counters Partitioning Persistence + metadata Saturation
What comes next
  • Capacity planning (m, k) for target false-positive rates.
  • Scalable and distributed Bloom filters.
  • Compressed filters and hybrid designs.

Roadmap

Resources