gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lsd0003] branch master updated: fix BF/CBF section


From: gnunet
Subject: [lsd0003] branch master updated: fix BF/CBF section
Date: Wed, 13 Jan 2021 23:20:58 +0100

This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0003.

The following commit(s) were added to refs/heads/master by this push:
     new ff4b147  fix BF/CBF section
ff4b147 is described below

commit ff4b147870e751f9712cd26aad77a73c76acac11
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Wed Jan 13 23:20:12 2021 +0100

    fix BF/CBF section
---
 draft-summermatter-set-union.xml | 96 ++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 38 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index f39a4dc..74fc17d 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -177,34 +177,41 @@
         <section anchor="background" numbered="true" toc="default">
             <name>Background</name>
             <section anchor="bf" numbered="true" toc="default">
-                <name>Bloom Filter</name>
+                <name>Bloom Filters</name>
                 <t>
-                    A Bloom Filter (BF) is a space-efficient datastructure to 
test if am element is part of a set of elements.
-                    Since a Bloom filter is a probabilistic datastructure its 
possible to have false positives but false negatives
-                    are not possible.
+                    A Bloom filter (BF) is a space-efficient datastructure to 
test if am element is part of a set of elements.
+                    Since a BF is a probabilistic datastructure, it is 
possible to have false-positives: when asked
+                    if an element is in the set, the answer from a BF is 
either "no" or "maybe".
                 </t>
                 <t>
-                    A BF consists of multiple buckets, every bucket can be set 
to 0 or 1. In the beginning all buckets are set
-                    to 0. To add an element to the BF the corresponding 
buckets are set to 1. To map an element on the array of buckets
-                    a mapping function M is required. The mapping function is 
non-injective an takes an element as input and outputs a
-                    deterministic bit stream of the length of the BF. The 
mapping function is described by the following mathematical equation:
+                    A BF consists of L buckets. Every bucket is a binary value 
that can be either 0 or 1. All buckets are initialized
+                    to 0.  A mapping function M is used to map each element of 
the set to a subset of k buckets.  M is non-injective
+                    and can thus map the same element multiple times to the 
same bucket.
+                    The type of the mapping function can thus be described by 
the following mathematical notation:
                 </t>
                 <figure anchor="bf_mapping_function_math">
                     <artwork name="" type="" align="left" alt=""><![CDATA[
-            --------------------------------
-            # M: E->B^k (B=Z/l)
-            --------------------------------
-            # l = Number of bits per element
-            # B = 0,1,2,3,4,...l
-            # k = Number of buckets
-            # E = Element from the set
-            # Z = Natural Numbers Mod l
-            --------------------------------
-            Example: l=256, k=3
-            M(E) = {4,6,255}
+            ------------------------------------
+            # M: E->B^k
+            ------------------------------------
+            # L = Number of buckets
+            # B = 0,1,2,3,4,...L-1 (the buckets)
+            # k = Number of buckets per element
+            # E = Set of elements
+            ------------------------------------
+            Example: L=256, k=3
+            M('element-data') = {4,6,255}
 
                      ]]></artwork>
                 </figure>
+                <t>
+                    A typical mapping function is constructed by hashing the 
element, for example
+                    using the well-known HKDF construction (FIXME: cite HKDF 
RFC!).
+                </t>
+                <t>
+                    To add an element to the BF, the corresponding buckets 
under the map M are set to 1.
+                    To check if an element may be in the set, one tests if all 
buckets under the map M are set to 1.
+                </t>
                 <t>
                     Further in this document a bitstream outputted by the 
mapping function is represented by
                     a set of numeric values for example (0101) = (2,4).
@@ -236,8 +243,13 @@
                      ]]></artwork>
                 </figure>
                 <t>
-                    Its not possible to remove an element from the BF because 
buckets can only be set to 1 or 0 so its not possible to
-                    differentiate between buckets containing one or multiple 
elements. To remove elements from the BF a <xref target="cbf" format="title" />
+                  The parameters L and k depend on the set size and must be
+                  chosen carefully to ensure that the BF does not return too
+                  many false-positives.
+                </t>
+                <t>
+                    It is not possible to remove an element from the BF 
because buckets can only be set to 1 or 0. Hence it is impossible to
+                    differentiate between buckets containing one or more 
elements. To remove elements from the BF a <xref target="cbf" format="title" />
                     is required.
                 </t>
             </section>
@@ -245,13 +257,13 @@
             <section anchor="cbf" numbered="true" toc="default">
                 <name>Counting Bloom Filter</name>
                 <t>
-                    A Counting Bloom Filter (CBF) is an extension of the <xref 
target="bf" format="title" />. In the CBF the binary filed of the bucket is 
replaced by
-                    an unsigned counter. This allows the removal of an 
elements from the CBF.
+                  A Counting Bloom Filter (CBF) is an extension of the <xref 
target="bf" format="title" />. In the CBF, buckets are
+                  unsigned numbers instead of binary values.  This allows the 
removal of an elements from the CBF.
                 </t>
                 <t>
-                    Adding an element to the CBF is similar to the adding 
operation of the BF but instead of setting the bucket on hit to 1 the bucket
-                    is increased by 1. For example if two colliding elements 
M(element1) = (1,3) and
-                    M(element2) = (0,3) are added to the CBF bucket 0 and 1 
are set to 1 and bucket 3 (the colliding bucket) is set
+                  Adding an element to the CBF is similar to the adding 
operation of the BF. However, instead of setting the bucket on hit to 1 the
+                  numeric value stored in the bucket is increased by 1. For 
example if two colliding elements M(element1) = (1,3) and
+                    M(element2) = (0,3) are added to the CBF, bucket 0 and 1 
are set to 1 and bucket 3 (the colliding bucket) is set
                     to 2:
                 </t>
                 <figure anchor="figure_cbf_insert_0">
@@ -263,11 +275,10 @@
                      ]]></artwork>
                 </figure>
                 <t>
-                    The order of a bucket is defined by the counter, if a 
bucket contains two elements the counter is set to 2 so the order of
-                    the bucket is two.
+                    The counter stored in the bucket is also called the order 
of the bucket.
                 </t>
                 <t>
-                    To remove an element form the CBF the counter is decreased 
by 1.
+                    To remove an element form the CBF the counters of all 
buckets the element is mapped to are decreased by 1.
                 </t>
                 <t>
                     Removing M(element2) = (1,3) from the CBF above:
@@ -280,19 +291,28 @@
             +-------------+-------------+-------------+-------------+
                      ]]></artwork>
                 </figure>
+                <t>
+                  In practice, the number of bits available for the counters 
is usually finite. For example, given a 4-bit
+                  counter, a CBF bucket would overflow once 16 elements are 
mapped to the same bucket. To efficiently
+                  handle this case, the maximum value (15 in our example) is 
considered to represent "infinity". Once the
+                  order of a bucket reaches "infinity", it is no longer 
incremented or decremented.
+                </t>
+                <t>
+                  The parameters L and k and the number of bits allocated to 
the counters should depend on the set size.
+                  An IBF will degenerate when subjected to insert and remove 
iterations of different elements, and eventually all
+                  buckets will reach "infinity".  The speed of the degradation 
will depend on the choice of L and k in
+                  relation to the number of elements stored in the IBF.
+                </t>
             </section>
         </section>
 
         <section anchor="ibv" numbered="true" toc="default">
         <name>Invertible Bloom Filter</name>
             <t>
-                A Invertible Bloom Filter (IBF) is a further extension of the 
<xref target="cbf" format="title" />. The IBF extends the <xref target="cbf" 
format="title" /> with two more operations,
-                beside insert and remove the IBF supports a decode and set 
difference operation. This two extra operations allows the
-                IBF to calculate small differences in big sets very 
efficiently.
-
-                IBF are useful when there is the need to check a set of 
elements for the presence or absence of some
-                elements in a big set efficiently.
-
+                An Invertible Bloom Filter (IBF) is a further extension of the 
<xref target="cbf" format="title" />.
+                An IBF extends the <xref target="cbf" format="title" /> with 
two more operations:
+                decode and set difference. This two extra operations are 
useful to efficiently extract
+                small differences between large sets.
             </t>
             <section anchor="ibf_format" numbered="true" toc="default">
                 <name>Format</name>
@@ -556,7 +576,7 @@
         <section anchor="modeofoperation" numbered="true" toc="default">
             <name>Mode of operation</name>
             <t>
-                The set union protocol uses the above discussed topics 
Invertible Bloom Filter and Strata Estimators.
+                The set union protocol uses the above discussed topics IBF and 
Strata Estimators.
                 Depending on the state of the two sets there are different 
strategies or operation modes how to efficiently
                 determinate missing elements in two sets of two peers.
             </t>
@@ -848,7 +868,7 @@
                 <section anchor="messages_ibf_description" numbered="true" 
toc="default">
                     <name>Description</name>
                     <t>
-                        The Invertible Bloom Filter message contains a slices 
of the IBF.
+                        The IBF message contains a slice of the IBF.
                     </t>
                     <t>
                         The <em>IBF</em> message is sent at the start of the 
protocol from the initiating peer in the transaction

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]