Bloom Filter Lab · Educational Microsite

Probabilistic membership, visualized and hands-on

Explore classic, counting, and partitioned Bloom filters with a runnable CLI demo. Learn the mechanics, tradeoffs, and roadmap while keeping artifacts portable via the evergreen contract.

Slug: bloom Repo: stainlessray/bloom Host: stainlessray.com/public/site Version: v0.1.1 demo

What is a Bloom filter?

The Bloom filter is a probabilistic checklist that answers “have I seen this item?” with minimal memory.

If it says no, the item is definitely absent.
If it says yes, the item is probably present (false positives are possible).
Invented by Burton Howard Bloom (1970) to save memory in dictionary lookups.

Core mechanics

Bit array initialized to 0.
Multiple hash functions map an item to k bit positions.
Add: set those bits to 1. Check: if any mapped bit is 0, item is absent.

Literal view

{ apple, banana, cherry }

bits = 0 0 0 0 0 0 0 0 0 0
apple  -> set 2,4,8
banana -> set 1,4,7
cherry -> set 3,6,9

bits = 0 1 1 1 1 0 1 1 1 1

Variants and tradeoffs

Variant	Why use it	Tradeoff
Classic	Fast membership checks for static sets	No deletions; saturates over time
Counting	Supports deletion via counters	Uses more memory
Partitioned	Distributes load across partitions	Slightly higher compute overhead
Scalable	Expands with new filters as load grows	More complex storage/management
Cuckoo filter	Low FPR with deletions baked in	More complex insertion logic

Learn by doing

Launch the interactive demo

mvn clean package
java -cp target/classes com.bloomfilter.demo.InteractiveBloomDemo

Switch modes: mode classic | mode counting | mode partitioned

Standardized ingestion

mode classic
ingestlist src/main/resources/data/fruit.txt filters/
loadstd filters/fruit_Classic_m64_k3_vYYYYMMDDHHMMSS.bin

Creates a portable .bin with metadata (algorithm, bits, hashes, source, timestamp).

Sample session

mode counting
ingestlist src/main/resources/data/fruit.txt filters/
check cherry
remove cherry
save filters/fruit_Counting_m64_k3.bin

Educational path

Concepts covered so far:

Bit-level tracking Hash multi-mapping False positives Deletion via counters Partitioning Persistence + metadata Saturation

What comes next

Capacity planning (m, k) for target false-positive rates.
Scalable and distributed Bloom filters.
Compressed filters and hybrid designs.

Roadmap

v0.2: loadmeta command, enhanced CLI help.
v0.3: Docker sandbox for browser-based testing.
Future: optimization and educational refinements.

Resources

Repository Evergreen index.json Details directory ABOUT_BLOOM_FILTERS.md EDUCATIVE.md