[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0004] 03/03: Try to disentangle Bloom filters
From: |
gnunet |
Subject: |
[lsd0004] 03/03: Try to disentangle Bloom filters |
Date: |
Fri, 23 Dec 2022 12:36:24 +0100 |
This is an automated email from the git hooks/post-receive script.
martin-schanzenbach pushed a commit to branch master
in repository lsd0004.
commit 3f70faf531ebd101004f5b93e1c80ee1ec5bcdd1
Author: Martin Schanzenbach <schanzen@gnunet.org>
AuthorDate: Fri Dec 23 20:36:03 2022 +0900
Try to disentangle Bloom filters
---
draft-schanzen-r5n.xml | 194 +++++++++++++++++++++++++++++--------------------
1 file changed, 115 insertions(+), 79 deletions(-)
diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml
index 98c46f9..ab5478d 100644
--- a/draft-schanzen-r5n.xml
+++ b/draft-schanzen-r5n.xml
@@ -349,7 +349,7 @@ Connectivity | |Underlay| |Underlay|
</figure>
</section>
<section anchor="overlay" numbered="true" toc="default">
- <name>Overlay operations</name>
+ <name>Overlay Operations</name>
<t>
An implementation of this specification commonly exposes the two
overlay
operations "GET" and "PUT".
@@ -358,9 +358,9 @@ Connectivity | |Underlay| |Underlay|
picture of the protocol.
</t>
<section>
- <name>The GET procedure</name>
+ <name>GET operation</name>
<t>
- A basic GET procedure may be exposed as:
+ A basic GET operation interface may be exposed as:
</t>
<t>
<tt>GET(Query-Key, Block-Type) -> Results as List</tt>
@@ -462,9 +462,9 @@ Connectivity | |Underlay| |Underlay|
</dl>
</section>
<section>
- <name>The PUT procedure</name>
+ <name>PUT operation</name>
<t>
- A PUT procedure may be exposed as:
+ A PUT operation interface may be exposed as:
</t>
<t>
<tt>PUT(Key, Block-Type, Block-Expiration, Block-Data)</tt>
@@ -773,47 +773,6 @@ bchar = *(ALPHA / DIGIT)
</t>
</section>
</section>
- <section anchor="bloom_filters" numbered="true" toc="default">
- <name>Bloom Filters</name>
- <t>
- R<sup>5</sup>N uses Bloom filters in several places. This section
- gives some general background on Bloom filters and defines functions
- on this data structure shared by the various use-cases in
R<sup>5</sup>N.
- </t>
- <t>
- A Bloom filter (BF) is a space-efficient probabilistic datastructure
- to test if an element is part of a set of elements.
- Elements are identified by an element ID.
- Since a BF is a probabilistic datastructure, it is possible to have
- false-positives: when asked if an element is in the set, the answer
from
- a BF is either "no" or "maybe".
- </t>
- <t>
- Bloom filters are defined as a string of <tt>L</tt> bits always
- initially empty, consisting only of zeroes.
- There are two functions which can be invoked on the Bloom filter:
- BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to
- be added to the Bloom filter or queried against the set.
- The mapping function <tt>M</tt> is defined as follows:
- </t>
- <t>
- The element <tt>e</tt> is hashed using SHA-512.
- The resulting byte string is interpreted as a string of 16
- 32-bit integers in network byte order.
- </t>
- <t>
- When adding an element to the Bloom filter <tt>bf</tt> using
- <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
- <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
- <tt>bf</tt> and set to 1.
- </t>
- <t>
- When testing if an element may be in the Bloom filter <tt>bf</tt> using
- <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
- <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
- Otherwise, the element is not considered to be in the Bloom filter.
- </t>
- </section>
<section anchor="routing" numbered="true" toc="default">
<name>Routing</name>
<t>
@@ -920,9 +879,19 @@ bchar = *(ALPHA / DIGIT)
traversed peers when searching for the next hops in the routing
table.
</t>
<t>
- The peer Bloom filter is always a 128 byte field. The elements
- hashed into the Bloom filter are the 32 byte peer IDs. We note
- that the peer Bloom filter may exclude peers due to false-postive
+ The peer Bloom filter (see <xref target="bloom_filters"/>) consists
of
+ L=1024 buckets.
+ The mapping function <tt>M</tt> is defined as follows:
+ </t>
+ <t>
+ The element <tt>e</tt> is a 32 byte peer ID and is hashed using
+ SHA-512.
+ The resulting byte string is interpreted as a string of k=16
+ 32-bit integers in network byte order which are used to set the
bucket bits
+ accordingly.
+ </t>
+ <t>
+ We note that the peer Bloom filter may exclude peers due to
false-postive
matches. This is acceptable as routing should nevertheless
terminate (with high probability) in close vicinity of the key.
</t>
@@ -2395,10 +2364,33 @@ gnunet+tcp://12.3.4.5/ \
<dd>
<t>
The RESULT_FILTER for HELLO blocks is implemented using a
- Bloom filter.
- </t>
- <figure anchor="hello_rf" title="The HELLO Block Result Filter.">
- <artwork name="" type="" align="left" alt=""><![CDATA[
+ Bloom filter (see <xref target="bloom_filters"/>).
+ The size <tt>S = L/8</tt> of the Bloom filter in bytes depends on
+ the number of elements <tt>F</tt> known to be filtered at the
+ initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8
(bytes).
+ Otherwise, <tt>S</tt> is set to the minimum of
+ 2<sup>15</sup> and the lowest power
+ of 2 that is strictly larger than <tt>K*F/4</tt>
+ (in bytes).
+ </t>
+ <t>
+ The <tt>k</tt>-value for the HELLO_BF
+ Bloom filter is always 16.
+ <tt>k</tt> is never transmitted.
+ The elements used in the Bloom filter
+ consist of an XOR between the <tt>H_ADDRS</tt> field (as computed
using
+ SHA-512 over the <tt>ADDRESSES</tt>) and the SHA-512
+ hash of the <tt>MUTATOR</tt> field from a given HELLO block.
+ The mapping function M(<tt>H_ADDRS XOR MUTATOR</tt>) is an identity
+ function and returns the 512-bit XOR result unmodified.
+ This resulting byte string is interpreted as k=16 32-bit
+ integers in network byte order and used to set or check the bucket
bits
+ accordingly.
+ This function returns an empty Bloom filter of length <tt>S=
L/8</tt> as
+ part of RF:
+ </t>
+ <figure anchor="hello_rf" title="The HELLO Block Result Filter.">
+ <artwork name="" type="" align="left" alt=""><![CDATA[
0 8 16 24 32 40 48 56
+-----+-----+-----+-----+-----+-----+-----+-----+
| MUTATOR | HELLO_BF /
@@ -2407,7 +2399,6 @@ gnunet+tcp://12.3.4.5/ \
+-----+-----+-----+-----+-----+-----+-----+-----+
]]></artwork>
</figure>
-
<t>where:</t>
<dl>
<dt>MUTATOR</dt>
@@ -2416,24 +2407,16 @@ gnunet+tcp://12.3.4.5/ \
</dd>
<dt>HELLO_BF</dt>
<dd>
- The <tt>K</tt>-value for the HELLO_BF Bloom filter
- is always 16. The size <tt>S</tt> of the Bloom filter in
bytes depends on
- the number of elements <tt>F</tt> known to be filtered at
the
- initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is
just 8 (bytes).
- Otherwise, <tt>S</tt> is set to the minimum of
- 2<sup>15</sup> and the lowest power
- of 2 that is strictly larger than <tt>K*F/4</tt>
- (in bytes). The wire format of HELLO_BF is the resulting
byte
- array. In particular, <tt>K</tt> is never transmitted.
+ The Bloom filter byte array.
</dd>
</dl>
-<t>
- R<sup>5</sup>N HELLO blocks use a <tt>MUTATOR</tt> value
- to additionally "randomize" the computation of the bloom filter while
remaining
- deterministic across peers. The 32-bit <tt>MUTATOR</tt>
- value is set by the peer initiating the GET request, and changed
- every time the GET request is repeated by the initiator. Peers
- forwarding GET requests <bcp14>MUST</bcp14> not change the
+ <t>
+ The <tt>MUTATOR</tt> value is used
+ to additionally "randomize" the computation of the Bloom filter
while
+ remaining deterministic across peers.
+ It is only ever set by the peer initiating the GET
+ request, and changed every time the GET request is repeated.
+ Peers forwarding GET requests <bcp14>MUST</bcp14> not change the
mutator value included in the <tt>RESULT_FILTER</tt> as they might
not
be able to recalculate the result filter with a different
<tt>MUTATOR</tt>
value.
@@ -2457,12 +2440,9 @@ gnunet+tcp://12.3.4.5/ \
</dd>
<dt>FilterResult(Block, Key, RF, XQuery) ->
(FilterEvaluationResult, RF')</dt>
<dd>
- To filter results of HELLO blocks using the Bloom filter, the
- <tt>H_ADDRS</tt> field (as computed using SHA-512 over
- the <tt>ADDRESSES</tt>) and XORed with the SHA-512
- hash of the <tt>MUTATOR</tt> (in network byte order).
- The resulting value is then used when hashing into the
- Bloom filter as described in <xref target="bloom_filters" />.
+ The <tt>H_ADDRS</tt> field is XORed with the SHA-512
+ hash of the <tt>MUTATOR</tt> field from the HELLO block and the
resulting
+ value is checked against the Bloom filter in RF.
Consequently, HELLOs with completely identical sets of
addresses will be filtered and FILTER_DUPLICATE is returned.
Any small variation in the set of addresses will cause the block
@@ -2833,8 +2813,64 @@ Type | Name | References | Description
</front>
</reference>
</references>
- <!-- Change Log
- v00 2017-07-23 MS Initial version
- -->
+ <section anchor="bloom_filters" numbered="true" toc="default">
+ <name>Bloom filters in R<sup>5</sup>N</name>
+ <t>
+ R<sup>5</sup>N uses Bloom filters in several places. This section
+ gives some general background on Bloom filters and defines functions
+ on this data structure shared by the various use-cases in
R<sup>5</sup>N.
+ </t>
+ <t>
+ A Bloom filter (BF) is a space-efficient probabilistic datastructure
+ to test if an element is part of a set of elements.
+ Elements are identified by an element ID.
+ Since a BF is a probabilistic datastructure, it is possible to have
+ false-positives: when asked if an element is in the set, the answer
from
+ a BF is either "no" or "maybe".
+ </t>
+ <t>
+ Bloom filters are defined as a string of <tt>L</tt> bits called
"buckets".
+ The buckets are initially always empty, meaning that the bits are set
to
+ zero.
+ There are two functions which can be invoked on the Bloom filter "bf":
+ BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element that is to
+ be added to the Bloom filter or queried against the set.
+ </t>
+ <t>
+ A mapping function M is used to map each ID of each element from the
set to a
+ subset of k buckets.
+ In the original proposal by Bloom, M is non-injective and can thus map
the same
+ element multiple times to the same bucket.
+ The type of the mapping function can thus be described by the following
+ mathematical notation:
+ </t>
+ <figure anchor="figure_bf_func" title="Bloom filter mapping function.">
+ <artwork><![CDATA[
+ ------------------------------------
+ # M: E->B^k
+ ------------------------------------
+ # L = Number of buckets
+ # B = 0,1,2,3,4,...L-1 (the buckets)
+ # k = Number of buckets per element
+ # E = Set of elements
+ ------------------------------------
+ Example: L=256, k=3
+ M('element-data') = {4,6,255}
+]]>
+ </artwork>
+ </figure>
+ <t>
+ When adding an element to the Bloom filter <tt>bf</tt> using
+ <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
+ <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
+ <tt>bf</tt> and set to 1.
+ </t>
+ <t>
+ When testing if an element may be in the Bloom filter <tt>bf</tt> using
+ <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
+ <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
+ Otherwise, the element is not considered to be in the Bloom filter.
+ </t>
+ </section>
</back>
</rfc>
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.