Skip to main content

Command Palette

Search for a command to run...

How Does Instagram Check If a Username Is Taken Instantly? (Bloom Filters Explained)

Updated
12 min read
How Does Instagram Check If a Username Is Taken Instantly? (Bloom Filters Explained)

You're signing up for a new account. You type a username — john — and before your finger even leaves the keyboard, a little red ✗ appears: "Username already taken." You try john_dev_2025, and instantly a green ✓ pops up.

It feels instant. It feels like magic.

But stop and think about the naive version of this. Instagram has over 2 billion users. Gmail has even more. Are they really running a database query like this against a billion-row table on every single keystroke?

SELECT 1 FROM users WHERE username = 'john';

If they were, every keystroke from every signup happening anywhere on Earth would hammer the database. It would be slow, expensive, and would fall over instantly at that scale.

So they don't. They use something much cleverer. Let me show you the data structure at the heart of it: the Bloom filter.


First, the honest disclaimer

Instagram and Gmail don't publish their exact internal stack, so I won't claim "Instagram uses a Bloom filter" as gospel. What I can tell you is that this is the textbook class of technique for solving exactly this problem — "does this item probably exist in a massive set?" — at scale. So think of this post as: here's how you'd build instant username checks for a billion users, and here's the data structure everyone reaches for.


The naive approach (and why it dies at scale)

The obvious solution is a database lookup on every check. With a proper unique index, a single lookup is fast — O(log n). So what's the problem?

The problem is volume and location.

  • Millions of people are signing up. The check fires on every keystroke (or near it).

  • Each check is a round trip: app server → database → back. That's a network hop plus disk I/O.

  • The vast majority of usernames people try are available — and you're spending a full database query just to confirm "nope, that one's free."

You're doing your most expensive operation for your most common, least interesting case. That's backwards. What we want is a way to answer "is this username definitely free?" in microseconds, from memory, without touching the database at all.

That's exactly what a Bloom filter gives you.


What is a Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure that answers one question:

"Is this element in the set?"

And it answers with one of two responses:

  • "No, definitely not." ← 100% certain.

  • "Maybe yes." ← probably, but not guaranteed.

That "maybe" is the whole trick. It trades a little bit of accuracy for an enormous saving in memory and speed. And as you'll see, the "maybe" is completely safe for our use case.

Under the hood, a Bloom filter is shockingly simple. It's just two things:

  1. A bit array — a row of bits, all starting at 0.

  2. A few hash functions — each one takes an input and spits out a position in that array.

That's it. No stored strings, no records, no rows. Just bits.


Let's trace it by hand (the part that makes it click)

Forget billions for a second. Let's use a tiny Bloom filter: a bit array of just 10 bits, and 2 hash functions (h1 and h2).

We start with everything at zero:

Index:   0  1  2  3  4  5  6  7  8  9
Bits:  [ 0  0  0  0  0  0  0  0  0  0 ]

Inserting "alice"

We run "alice" through both hash functions. Say:

  • h1("alice")2

  • h2("alice")5

We set bits 2 and 5 to 1:

Index:   0  1  2  3  4  5  6  7  8  9
Bits:  [ 0  0  1  0  0  1  0  0  0  0 ]
              ↑        ↑
           alice    alice

Inserting "bob"

  • h1("bob")0

  • h2("bob")7

Set bits 0 and 7:

Index:   0  1  2  3  4  5  6  7  8  9
Bits:  [ 1  0  1  0  0  1  0  1  0  0 ]
         ↑     ↑        ↑     ↑
        bob  alice   alice  bob

Our set is now {alice, bob}. Notice we never stored the strings "alice" or "bob" anywhere — just flipped some bits. Now let's query.

Query 1: Is "alice" taken? → Correct "yes"

Run "alice"h1=2, h2=5. Check those bits:

  • Bit 2 = 1

  • Bit 5 = 1

Both are set → "possibly taken." Correct! Alice really is in there.

Query 2: Is "dave" taken? → Instant, correct "no"

Run "dave" → say h1=3, h2=9. Check:

  • Bit 3 = 0

Stop right there. One bit is zero, which is impossible if "dave" had ever been inserted (we always set bits on insert, and bits never go back to zero). So:

"dave" is DEFINITELY available.

Instant green checkmark. Zero database queries. This is the magic moment — and it's the common case, because most usernames people try are free.

Query 3: Is "carol" taken? → The famous false positive

Run "carol" → say h1=2, h2=7. Check:

  • Bit 2 = 1 ✓ (but that was set by alice!)

  • Bit 7 = 1 ✓ (but that was set by bob!)

Both bits are 1, so the filter says "possibly taken." But carol was never inserted. Her two bits just happened to collide with bits set by other people.

This is a false positive — and understanding why it's totally fine is the key insight of this whole post.


The two guarantees (memorize these)

Everything about Bloom filters comes down to two rules:

1. No false negatives. If a username is taken, the filter will never say "available." Why? Because inserting it set its bits to 1, and bits never flip back to 0. So a real entry's bits are always there.

2. False positives are possible. The filter might say "maybe taken" for a name that's actually free (like carol), because of bit collisions.

Now — why is this acceptable for username checks? Because of how we use each answer:

Filter says What we do Cost
"Definitely available" Trust it. Show green ✓ instantly. ~microseconds, no DB
"Maybe taken" Don't trust it — go ask the real database. One DB query, only when needed

A false positive just means we occasionally do a database lookup that confirms "actually, that one's free." We were going to query the DB anyway for taken names — we've just eliminated the database hit for the entire common case of available names. The filter never lies in a way that could let two people grab the same username, because the "maybe" always falls through to the source of truth.

The expensive operation now only runs when it might matter.


"But how does it fit in memory for a BILLION users?"

Here's the payoff that makes people share this post.

Suppose you want to store 1 billion usernames with a 1% false-positive rate. The size of the bit array you need comes from this formula:

m = -(n × ln p) / (ln 2)²

Where n = 1,000,000,000 and p = 0.01. Plug it in and you get:

  • Bit array size: ~9.6 billion bits1.2 GB

  • Optimal number of hash functions: ~7

1.2 GB. For a billion usernames. That fits comfortably in the RAM of a single server.

Compare that to storing the actual strings: a billion usernames averaging ~12 characters each is 15+ GB before index overhead — and it lives on disk, behind the database. The Bloom filter is over 10× smaller and lives entirely in memory.

And a lookup? Just hash the input ~7 times and read ~7 bits out of an in-memory array. No disk. No network. Microseconds.


The full picture: it's a funnel, not a single check

Here's the part most blog posts skip. The "instant" check isn't one clever trick — it's a layered funnel, where each layer is faster and cheaper than the one below it, and most requests never reach the bottom.

Notice that half of "so fast" isn't even the Bloom filter — it's the frontend debounce. The browser simply doesn't send a request until you pause typing, so you never feel the lag of one-request-per-character.


The nuance that earns you respect in the comments

Here's something subtle but important: the username-check endpoint is just UX. It is not a guarantee.

Think about the race condition. The check says "cooldev is available," you smile, you spend three seconds filling in your password... and in those three seconds, someone else submits the form with cooldev first. The check was honest when it ran — it just went stale.

So what actually stops two people from owning the same username?

The database's UNIQUE constraint on the INSERT. When you finally submit, the database enforces uniqueness atomically. The second person to insert cooldev gets a rejection, and the app shows "sorry, just taken."

So the mental model is:

  • Bloom filter = fast and friendly. Makes the UI feel instant.

  • UNIQUE constraint = slow and correct. Makes it actually right.

You need both. The filter optimizes the happy path; the constraint guarantees correctness. Never rely on a "check" endpoint as your source of truth.


Leveling up: beyond the basic Bloom filter

Once you understand the basic version, a few natural "what about…" questions come up:

"What happens when someone deletes their account or changes their handle?" Ah — here's a real limitation. A standard Bloom filter can't delete. You can't just un-set a bit, because that bit might be shared with another username (remember carol borrowing alice's and bob's bits). Un-setting it would create a false negative, which breaks the one guarantee you can't break.

The fixes:

  • Counting Bloom filters — replace each bit with a small counter, so you can increment on insert and decrement on delete.

  • Cuckoo filters — a more modern alternative that supports deletion and often has faster lookups and better space efficiency at low false-positive rates. If you're building this today, this is the one to read about next.

Negative caching — you can also cache "this is available" results in Redis so repeated checks of the same trendy-but-free name don't recompute.

Tries — a cousin data structure worth knowing, used for the autocomplete version of this problem (suggesting john_dev, john_codes, etc.).


Show me the code

Because none of this should feel like magic, here's a tiny Bloom filter in plain JavaScript — no libraries, ~25 lines. It's literally an array and two hash functions.

class BloomFilter {
  constructor(size = 1000) {
    this.size = size;
    this.bits = new Uint8Array(size); // our bit array, all zeros
  }

  // Two simple, different hash functions
  _hash1(str) {
    let h = 0;
    for (let i = 0; i < str.length; i++) {
      h = (h * 31 + str.charCodeAt(i)) % this.size;
    }
    return h;
  }

  _hash2(str) {
    let h = 0;
    for (let i = 0; i < str.length; i++) {
      h = (h * 17 + str.charCodeAt(i) + 7) % this.size;
    }
    return h;
  }

  add(str) {
    this.bits[this._hash1(str)] = 1;
    this.bits[this._hash2(str)] = 1;
  }

  // false  = DEFINITELY not present
  // true   = MAYBE present (go check the DB)
  mightContain(str) {
    return this.bits[this._hash1(str)] === 1 &&
           this.bits[this._hash2(str)] === 1;
  }
}

// --- Try it ---
const filter = new BloomFilter();
filter.add("alice");
filter.add("bob");

console.log(filter.mightContain("alice")); // true  → maybe taken → check DB
console.log(filter.mightContain("dave"));  // false → DEFINITELY available ✓
console.log(filter.mightContain("carol")); // could be a false positive!

That mightContain returning false is your instant green checkmark. Everything else is just scaling this idea up to a billion entries and seven hash functions.


TL;DR

  • Checking a billion-row table on every keystroke would be slow and expensive — so big platforms don't.

  • A Bloom filter is a tiny bit array + a few hash functions that answers "is this taken?" in microseconds from memory.

  • It guarantees no false negatives (if it says "available," it's truly available) but allows false positives ("maybe taken" → just fall through to a real DB check).

  • ~1.2 GB of RAM covers 1 billion usernames at a 1% error rate — over 10× smaller than storing the strings, and it never touches disk.

  • "So fast" is really a funnel: frontend debounce → Bloom filter → cache → DB index → and the UNIQUE constraint is what actually keeps usernames unique.

  • Want deletion support? Look at Counting Bloom filters and Cuckoo filters next.

The next time a green checkmark appears the instant you finish typing — you'll know there's a humble bit array doing the heavy lifting behind it.

If this helped the idea click, drop a comment with what you'd like me to break down next — I'm thinking Cuckoo filters or how rate limiters work under the hood.