Sent to you by jeffye via Google Reader:
via My Biased Coin by Michael Mitzenmacher on 2/27/09There's a useful hash table/Bloom filter trick I've been seeing across a number of papers. (If you don't know what a Bloom filter is, then I haven't been doing my job...) For the record, I'm not claiming this trick is mine; maybe it originated in the paper Longest Prefix Matching Using Bloom Filters, where it plays a key role. But I also can't recall a paper that clearly highlights the idea, so I'm doing it here.
Suppose you'll be doing lookups of (large) keys in a (slow) hash table, and many of your lookups will be for items that actually are not in the table. For example, you may be storing URLs on blacklist or dangerous packet signatures or prefixes of addresses or whatever else. Then it makes sense to keep a Bloom filter for the items in your hash table in (fast) memory. Before you do a lookup into the table, you check the Bloom filter. A false positive in the Bloom filter means you do a lookup when you didn't have to, but this allows you to avoid a lookup for a key that's not in your hash table the large majority of the time. The approach works for any type of hash table.
This should be considered a standard approach in hash table design, so I suggest we come up with a catchy name for it, so it doesn't have to be continually repeated in future papers. I'll suggest it be called a "hash table with preemptive Bloom filter", but feel free to suggest a cooler sounding name for it in the comments.
This example highlights the need for coming up with names for nice ideas that generalize to many situations. I've seen this idea described repeatedly in multiple papers precisely because nobody has thought to give it a good name.