Welcome to ftp.nluug.nl Current directory: /ibiblio/distributions/CPAN/authors/id/P/PA/PALVARO/ |
|
Contents of README:NAME Bloom::Faster - Perl extension for the c library libbloom. INSTALLATION see INSTALL SYNOPSIS use Bloom::Faster; # m = ideal vector size. # k = # of hash functions to use. my $bloom = new Bloom::Faster({m => 1000000,k => 5}); # this gives us very tight control of memory usage (a function of m) # and performance (a function of k). but in most applications, we won't # know the optimal values of either of these. for these cases, it is # much easier to supply: # # n = number of expected elements to check for duplicates, # e = acceptable error rate (probability of false positive) # # my $bloom = new Bloom::Faster({n => 1000000, e => 0.00001}); while (<>) { chomp; # Bloom::Faster->add() returns true when the value is a duplicate. if ($bloom->add($_)) { print "DUP: $_\n"; } } DESCRIPTION Bloom filters are a lightweight duplicate detection algorithm proposed by Burton Bloom (http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal), with applications in stream data processing, among others. Bloom filters are a very cool thing. Where occasional false positives are acceptable, bloom filters give us the ability to detect duplicates in a fast and resource-friendly manner. The allocation of memory for the bit vector is handled in the c layer, but perl's oo capability handles the garbage collection. when a Bloom::Faster object goes out of scope, the vector pointed to by the c structure will be free()d. to manually do this, the DESTROY builtin method can be called. A bloom filter perl module is currently avaible on CPAN, but it is profoundly slow and cannot handle large vectors. This alternative uses a more efficient c library which can handle arbitrarily large vectors (up to the maximum size of a "long long" datatype (at least 9223372036854775807, on supported systems ). EXPORT None by default. Exportable constants HASHCNT PRIME_SIZ SIZ SEE ALSO libbbloom.so AUTHOR Peter Alvaro and Dmitriy Ryaboy, <palvaro@ask.com> COPYRIGHT AND LICENSE Copyright (C) 2006 by Peter Alvaro and Dmitriy Ryaboy This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available. |
Name Last modified Size
Parent Directory - Bloom-Faster-1.3.1.meta 16-Mar-2007 20:08 312 Bloom-Faster-1.3.1.readme 10-Mar-2007 00:16 2.5K Bloom-Faster-1.3.1.tar.gz 17-Mar-2007 07:25 8.5M Bloom-Faster-1.3.meta 23-Feb-2007 06:48 310 Bloom-Faster-1.3.readme 23-Feb-2007 06:45 2.5K Bloom-Faster-1.3.tar.gz 23-Feb-2007 06:51 477K Bloom-Faster-1.4.meta 17-Mar-2007 08:44 310 Bloom-Faster-1.4.readme 10-Mar-2007 00:16 2.5K Bloom-Faster-1.4.tar.gz 17-Mar-2007 08:48 602K Bloom-Faster-1.6.2.meta 12-Jun-2010 23:05 312 Bloom-Faster-1.6.2.readme 22-Jun-2009 02:19 2.5K Bloom-Faster-1.6.2.tar.gz 12-Jun-2010 23:16 21K Bloom-Faster-1.6.meta 23-Jun-2009 04:41 307 Bloom-Faster-1.6.readme 22-Jun-2009 02:31 2.5K Bloom-Faster-1.6.tar.gz 23-Jun-2009 04:42 22K Bloom-Faster-1.7.meta 13-Jun-2010 00:06 310 Bloom-Faster-1.7.readme 22-Jun-2009 02:19 2.5K Bloom-Faster-1.7.tar.gz 13-Jun-2010 00:17 21K CHECKSUMS 21-Nov-2021 23:55 4.5K README 23-Feb-2007 18:54 2.5K
NLUUG - Open Systems. Open Standards
Become a member
and get discounts on conferences and more, see the NLUUG website!