makinacorpus/php-bloom

Name: php-bloom

Owner: Makina Corpus

Description: Simple PHP Bloom filter

Created: 2016-08-30 09:23:22.0

Updated: 2017-08-03 01:03:30.0

Pushed: 2017-11-30 17:51:15.0

Homepage: null

Size: 9

Language: PHP

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

PHP Bloom filter

Build Status

This is a simple PHP Bloom filter implementation using Sherif Ramadan's implementation.

Original code and a really great explaination can be found here http://phpden.info/Bloom-filters-in-PHP

It is slightly modified to correct some coding standard issues, to achieve a more flexible runtime configuration, and fixes a few performance issues.

Usage

You must first choose a targetted maximum number of elements that your filter will contain, and a false positive implementation, obviously the lesser are those two numbers, the faster the implementation will be.

ou may cache this value, and fetch it back, it's the whole goal of this
PI. Beware that the stored string might contain ``\0`` characters, ensure
our storage API deals with those strings in safe way.
ue = null;

onfigure your Bloom filter, if you store the value, you should store the
onfiguration along since selected hash algorithms and string size would
hange otherwise.
bability = 0.0001
Size = 10000;

ter = new \MakinaCorpus\Bloom\BloomFilter();

ou may add as many elements as you wish, elements can be any type, really,
f not scalar they will be serialized prior to being hashed.
ter->set('some_string');
ter->set(123456);
ter->set(['some' => 'array']);
ter->set(new \stdClass());

nd the whole goal of it:
$filter->check('some_value')) {
_something();

Notes

Please carefully read the original author's blog post, since it explains everything you need to know about Bloom filters: http://phpden.info/Bloom-filters-in-PHP

Please also use it wisely, the hashing algorithms are quite fast, but if you do use it too much, it will impact negatively on your CPU usage.

There are numerous other competitive implementations, you may use whichever seems the best for you, take a look around before choosing.


This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.