gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lsd0004] branch master updated (db89ead -> 3f70faf)


From: gnunet
Subject: [lsd0004] branch master updated (db89ead -> 3f70faf)
Date: Fri, 23 Dec 2022 12:36:21 +0100

This is an automated email from the git hooks/post-receive script.

martin-schanzenbach pushed a change to branch master
in repository lsd0004.

    from db89ead  Typos
     new 86896f2  Language
     new 564e5fa  More language
     new 3f70faf  Try to disentangle Bloom filters

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 draft-schanzen-r5n.xml | 220 ++++++++++++++++++++++++++++---------------------
 1 file changed, 128 insertions(+), 92 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml
index 9c2dbf9..ab5478d 100644
--- a/draft-schanzen-r5n.xml
+++ b/draft-schanzen-r5n.xml
@@ -276,8 +276,6 @@
       <t>
         The specific semantics of the above operations as provided by 
R<sup>5</sup>N
         for applications are defined in <xref target="overlay"/>.
-        The handling of blocks and their validation and storage is defined in
-        <xref target="blockstorage"/>.
       </t>
       <t>
         In a trivial scenario where there is only one peer (the local host),
@@ -291,10 +289,11 @@
         <xref target="underlay"/>.
         The interface provided by this underlay is used across the 
specification of the
         R<sup>5</sup>N protocol.
-        It also serves as a set of requirements of possible transport 
mechanism that
-        can be used to implement R<sup>5</sup>N on.
-        That being said, common transport protocols such as TCP/IP or UDP/IP 
are
-        supported.
+        It also serves as a set of requirements of possible transport 
mechanisms that
+        can be used to implement R<sup>5</sup>N with.
+        That being said, common transport protocols such as TCP/IP or UDP/IP 
and their
+        interfaces are suitable R<sup>5N</sup> underlays used by existing
+        implementations.
       </t>
       <!-- consider moving some of this back into sec considerations -->
       <t>
@@ -316,9 +315,9 @@
       </t>
       <t>
         Across this document, the functional components of an R<sup>5</sup>N
-        implementation are divided into block processing (<xref 
target="blockstorage"/>),
-        message processing (<xref target="p2p_messages"/>) and routing
-        (<xref target="routing"/>).
+        implementation are divided into routing (<xref target="routing"/>),
+        message processing (<xref target="p2p_messages"/>) and
+        block processing (<xref target="blockstorage"/>).
         <xref target="figure_r5n_arch"/> illustrates the architectural 
overview of
         R<sup>5</sup>N.
       </t>
@@ -350,17 +349,18 @@ Connectivity | |Underlay|  |Underlay|
       </figure>
     </section>
     <section anchor="overlay" numbered="true" toc="default">
-      <name>Application API</name>
+      <name>Overlay Operations</name>
       <t>
-        An implementation of this specification commonly exposes the two API
-        procedures "GET" and "PUT".
-        The following are non-normative examples of such APIs and their
-        behaviour are detailed in order to give implementers a fuller picture 
of the protocol.
+        An implementation of this specification commonly exposes the two 
overlay
+        operations "GET" and "PUT".
+        The following are non-normative examples of APIs for those operations.
+        Their behaviour is described prosaically in order to give implementers 
a fuller
+        picture of the protocol.
       </t>
       <section>
-        <name>The GET procedure</name>
+        <name>GET operation</name>
         <t>
-          A basic GET procedure may be exposed as:
+          A basic GET operation interface may be exposed as:
         </t>
         <t>
           <tt>GET(Query-Key, Block-Type) -> Results as List</tt>
@@ -462,9 +462,9 @@ Connectivity | |Underlay|  |Underlay|
         </dl>
       </section>
       <section>
-        <name>The PUT procedure</name>
+        <name>PUT operation</name>
         <t>
-          A PUT procedure may be exposed as:
+          A PUT operation interface may be exposed as:
         </t>
         <t>
           <tt>PUT(Key, Block-Type, Block-Expiration, Block-Data)</tt>
@@ -773,47 +773,6 @@ bchar = *(ALPHA / DIGIT)
         </t>
       </section>
     </section>
-    <section anchor="bloom_filters" numbered="true" toc="default">
-      <name>Bloom Filters</name>
-      <t>
-       R<sup>5</sup>N uses Bloom filters in several places.  This section
-       gives some general background on Bloom filters and defines functions
-       on this data structure shared by the various use-cases in 
R<sup>5</sup>N.
-      </t>
-      <t>
-        A Bloom filter (BF) is a space-efficient probabilistic datastructure
-        to test if an element is part of a set of elements.
-        Elements are identified by an element ID.
-        Since a BF is a probabilistic datastructure, it is possible to have
-        false-positives: when asked if an element is in the set, the answer 
from
-        a BF is either "no" or "maybe".
-      </t>
-      <t>
-        Bloom filters are defined as a string of <tt>L</tt> bits always
-        initially empty, consisting only of zeroes.
-        There are two functions which can be invoked on the Bloom filter:
-        BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to
-        be added to the Bloom filter or queried against the set.
-        The mapping function <tt>M</tt> is defined as follows:
-      </t>
-      <t>
-        The element <tt>e</tt> is hashed using SHA-512.
-        The resulting byte string is interpreted as a string of 16
-        32-bit integers in network byte order.
-      </t>
-      <t>
-        When adding an element to the Bloom filter <tt>bf</tt> using
-        <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
-        <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
-        <tt>bf</tt> and set to 1.
-      </t>
-      <t>
-        When testing if an element may be in the Bloom filter <tt>bf</tt> using
-        <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
-        <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
-        Otherwise, the element is not considered to be in the Bloom filter.
-      </t>
-    </section>
     <section anchor="routing" numbered="true" toc="default">
       <name>Routing</name>
       <t>
@@ -920,9 +879,19 @@ bchar = *(ALPHA / DIGIT)
           traversed peers when searching for the next hops in the routing 
table.
         </t>
         <t>
-          The peer Bloom filter is always a 128 byte field. The elements
-         hashed into the Bloom filter are the 32 byte peer IDs.  We note
-         that the peer Bloom filter may exclude peers due to false-postive
+          The peer Bloom filter (see <xref target="bloom_filters"/>) consists 
of
+          L=1024 buckets.
+          The mapping function <tt>M</tt> is defined as follows:
+        </t>
+        <t>
+          The element <tt>e</tt> is a 32 byte peer ID and is hashed using
+          SHA-512.
+          The resulting byte string is interpreted as a string of k=16
+          32-bit integers in network byte order which are used to set the 
bucket bits
+          accordingly.
+        </t>
+        <t>
+         We note that the peer Bloom filter may exclude peers due to 
false-postive
          matches.  This is acceptable as routing should nevertheless
          terminate (with high probability) in close vicinity of the key.
         </t>
@@ -2395,10 +2364,33 @@ gnunet+tcp://12.3.4.5/ \
           <dd>
          <t>
            The RESULT_FILTER for HELLO blocks is implemented using a
-      Bloom filter.
-    </t>
-     <figure anchor="hello_rf" title="The HELLO Block Result Filter.">
-        <artwork name="" type="" align="left" alt=""><![CDATA[
+            Bloom filter (see <xref target="bloom_filters"/>).
+            The size <tt>S = L/8</tt> of the Bloom filter in bytes depends on
+           the number of elements <tt>F</tt> known to be filtered at the
+           initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8 
(bytes).
+           Otherwise, <tt>S</tt> is set to the minimum of
+           2<sup>15</sup> and the lowest power
+           of 2 that is strictly larger than <tt>K*F/4</tt>
+           (in bytes).
+          </t>
+          <t>
+            The <tt>k</tt>-value for the HELLO_BF
+            Bloom filter is always 16.
+            <tt>k</tt> is never transmitted.
+            The elements used in the Bloom filter
+            consist of an XOR between the <tt>H_ADDRS</tt> field (as computed 
using
+            SHA-512 over the <tt>ADDRESSES</tt>) and the SHA-512
+            hash of the <tt>MUTATOR</tt> field from a given HELLO block.
+            The mapping function M(<tt>H_ADDRS XOR MUTATOR</tt>) is an identity
+            function and returns the 512-bit XOR result unmodified.
+            This resulting byte string is interpreted as k=16 32-bit
+            integers in network byte order and used to set or check the bucket 
bits
+            accordingly.
+            This function returns an empty Bloom filter of length <tt>S= 
L/8</tt> as
+            part of RF:
+          </t>
+          <figure anchor="hello_rf" title="The HELLO Block Result Filter.">
+            <artwork name="" type="" align="left" alt=""><![CDATA[
 0     8     16    24    32    40    48    56
 +-----+-----+-----+-----+-----+-----+-----+-----+
 |        MUTATOR        |  HELLO_BF             /
@@ -2407,7 +2399,6 @@ gnunet+tcp://12.3.4.5/ \
 +-----+-----+-----+-----+-----+-----+-----+-----+
 ]]></artwork>
           </figure>
-
           <t>where:</t>
           <dl>
             <dt>MUTATOR</dt>
@@ -2416,24 +2407,16 @@ gnunet+tcp://12.3.4.5/ \
             </dd>
             <dt>HELLO_BF</dt>
             <dd>
-              The <tt>K</tt>-value for the HELLO_BF Bloom filter
-                   is always 16.  The size <tt>S</tt> of the Bloom filter in 
bytes depends on
-                   the number of elements <tt>F</tt> known to be filtered at 
the
-                   initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is 
just 8 (bytes).
-                   Otherwise, <tt>S</tt> is set to the minimum of
-                   2<sup>15</sup> and the lowest power
-                   of 2 that is strictly larger than <tt>K*F/4</tt>
-                   (in bytes).  The wire format of HELLO_BF is the resulting 
byte
-              array. In particular, <tt>K</tt> is never transmitted.
+              The Bloom filter byte array.
             </dd>
           </dl>
-<t>
-            R<sup>5</sup>N HELLO blocks use a <tt>MUTATOR</tt> value
-       to additionally "randomize" the computation of the bloom filter while 
remaining
-           deterministic across peers.  The 32-bit <tt>MUTATOR</tt>
-           value is set by the peer initiating the GET request, and changed
-           every time the GET request is repeated by the initiator. Peers
-           forwarding GET requests <bcp14>MUST</bcp14> not change the
+          <t>
+           The <tt>MUTATOR</tt> value is used
+            to additionally "randomize" the computation of the Bloom filter 
while
+            remaining deterministic across peers.
+            It is only ever set by the peer initiating the GET
+            request, and changed every time the GET request is repeated.
+            Peers forwarding GET requests <bcp14>MUST</bcp14> not change the
            mutator value included in the <tt>RESULT_FILTER</tt> as they might 
not
            be able to recalculate the result filter with a different 
<tt>MUTATOR</tt>
            value.
@@ -2457,12 +2440,9 @@ gnunet+tcp://12.3.4.5/ \
           </dd>
             <dt>FilterResult(Block, Key, RF, XQuery) -&gt; 
(FilterEvaluationResult, RF')</dt>
             <dd>
-              To filter results of HELLO blocks using the Bloom filter, the
-             <tt>H_ADDRS</tt> field (as computed using SHA-512 over
-             the <tt>ADDRESSES</tt>) and XORed with the SHA-512
-             hash of the <tt>MUTATOR</tt> (in network byte order).
-             The resulting value is then used when hashing into the
-             Bloom filter as described in <xref target="bloom_filters" />.
+             The <tt>H_ADDRS</tt> field is XORed with the SHA-512
+             hash of the <tt>MUTATOR</tt> field from the HELLO block and the 
resulting
+             value is checked against the Bloom filter in RF.
              Consequently, HELLOs with completely identical sets of
              addresses will be filtered and FILTER_DUPLICATE is returned.
              Any small variation in the set of addresses will cause the block
@@ -2833,8 +2813,64 @@ Type    | Name            | References | Description
         </front>
       </reference>
     </references>
-    <!-- Change Log
-      v00 2017-07-23  MS   Initial version
-    -->
+    <section anchor="bloom_filters" numbered="true" toc="default">
+      <name>Bloom filters in R<sup>5</sup>N</name>
+      <t>
+       R<sup>5</sup>N uses Bloom filters in several places.  This section
+       gives some general background on Bloom filters and defines functions
+       on this data structure shared by the various use-cases in 
R<sup>5</sup>N.
+      </t>
+      <t>
+        A Bloom filter (BF) is a space-efficient probabilistic datastructure
+        to test if an element is part of a set of elements.
+        Elements are identified by an element ID.
+        Since a BF is a probabilistic datastructure, it is possible to have
+        false-positives: when asked if an element is in the set, the answer 
from
+        a BF is either "no" or "maybe".
+      </t>
+      <t>
+        Bloom filters are defined as a string of <tt>L</tt> bits called 
"buckets".
+        The buckets are initially always empty, meaning that the bits are set 
to
+        zero.
+        There are two functions which can be invoked on the Bloom filter "bf":
+        BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element that is to
+        be added to the Bloom filter or queried against the set.
+      </t>
+      <t>
+        A mapping function M is used to map each ID of each element from the 
set to a
+        subset of k buckets.
+        In the original proposal by Bloom, M is non-injective and can thus map 
the same
+        element multiple times to the same bucket.
+        The type of the mapping function can thus be described by the following
+        mathematical notation:
+      </t>
+      <figure anchor="figure_bf_func" title="Bloom filter mapping function.">
+        <artwork><![CDATA[
+        ------------------------------------
+        # M: E->B^k
+        ------------------------------------
+        # L = Number of buckets
+        # B = 0,1,2,3,4,...L-1 (the buckets)
+        # k = Number of buckets per element
+        # E = Set of elements
+        ------------------------------------
+        Example: L=256, k=3
+        M('element-data') = {4,6,255}
+]]>
+        </artwork>
+      </figure>
+      <t>
+        When adding an element to the Bloom filter <tt>bf</tt> using
+        <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
+        <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
+        <tt>bf</tt> and set to 1.
+      </t>
+      <t>
+        When testing if an element may be in the Bloom filter <tt>bf</tt> using
+        <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
+        <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
+        Otherwise, the element is not considered to be in the Bloom filter.
+      </t>
+    </section>
   </back>
 </rfc>

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]