News Feed
Sections




News Archive
feed this:

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Joshua Thijssen's Blog:
Bloom Filters
April 09, 2012 @ 11:13:32

In this new post to his blog Joshua Thijssen describes something that can help when processing large amounts of data (like, in his example, the text of a book) to search through the information and find if a certain piece of data is in the set - a bloom filter.

Most of my co-workers never really heard of bloom filters, and I'm continuously need to explain what they are, what their purpose is and why it's a better solution than other ones. So let's do an introduction on bloom filters. [...] Bloom filters have the property of being exceptionally fast AND exceptionally small compared to other structures but it comes with a price: it MIGHT be possible that our bloom filter thinks that an element is inside our set, when it really isn't. Luckily, the reverse is not possible: when a bloom filter says something is NOT in the set, you are 100% sure that it isn't part of the set.

He explains how the filter works, noting how it's better for memory consumption and how it's possible for it to give a "maybe" response instead of ab absolute "yes" or "no". He also points out a PHP extension, bloomy that takes the hard work out of it for you.

0 comments voice your opinion now!
bloom filter search memory consumption speed


Andrei Zmievski's Blog:
Bloom Filters Quickie
April 07, 2009 @ 11:13:01

Andrei Zmievski has written a new post about a new extension he's worked up (out of curiosity for the technology) - the pecl/bloomy extension.

A Bloom filter is a probabilistic data structure that can be used to answer a simple question, is the given element a member of a set? Now, this question can be answered via other means, such as hash table or binary search trees. But the thing about Bloom filters is that they are incredibly space-efficient when the number of potential elements in the set is large.

The filters allow false positives with a defined error rate - it gives the "yes" or "no" answer based on the content and you, the developer, decide if that answer falls within a rate that's okay for you and your app. The filters also take the same amount of time to look up items no matter how many are in the set.

He includes an example of the extension in use - defining the number of elements, the false positive allowance and adding/searching data and how the responses would come back from the checks.

0 comments voice your opinion now!
bloom filter pecl extension example false positive rate data structure



Community Events





Don't see your event here?
Let us know!


configure podcast community developer series threedevsandamaybe release framework laravel unittest code language introduction opinion interview list refactor testing install symfony2

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework