bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell


From: Bruno Haible
Subject: Re: bug#57129: 29.0.50; Improve behavior of conditionals in Eshell
Date: Sun, 21 Aug 2022 19:17:08 +0200

Paul Eggert asked:
> > Could you investigate further why mingw 64-bit fails?

Some words on the "why". You seem to have expectations regarding the
distribution quality of the resulting file names, but these expectations
are not warranted.

Donald E. Knuth wrote in TAOCP vol 2 "Seminumerical algorithms" that
an arbitrary sequence of number manipulating operations usually has a
bad quality as a pseudo-random number generator, and that in order
to get good quality pseudo-random numbers, one needs to use *specific*
pseudo-random number generators and then *prove* mathematical properties
of it. (The same is true, btw, for crypto algorithms.)
Then he started discussing linear congruential generators [1] as the
simplest example for which something could be proved.

The current source code of tempname.c generates the pseudo-random numbers
— assuming no HAS_CLOCK_ENTROPY and no ASLR — using a mix of three
operations:
  (A) A linear congruential generator [2] with m = 2^64,
      a = 2862933555777941757, c = 3037000493.
  (B) A floor operation: v ← floor(v / 62^6)
  (C) A xor operation: v ^= prev_v

There are three different questions:
(Q1) What is the expected quality inside a single gen_tempname call?
(Q2) What is the expected quality of multiple gen_tempname calls in a
     single process?
(Q3) What is the expected quality when considering different processes
     that call gen_tempname?

Answers:
(Q1) For just 6 'X'es, there is essentially a single (A) operation.
     Therefore the quality will be good.
     If someone uses more than 10 'X'es, for example, 50 'X'es, there
     will be 5 (A) and 5 (B), interleaved: (A), (B), (A), (B), ...
     This is *not* a linear congruential generator, therefore the
     expected quality is BAD.
     In order to fix this case, what I would do is to get back to
     a linear congruential generator: (A), (A), ..., (A), (B).
     In other words, feed into (A) exactly the output from the
     previous (A). This means, do the statements
          XXXXXX[i] = letters[v % 62];
          v /= 62;
     not on v itself, but on a copy of v.
     But wait, there is also the desire to have predictability!
     This means to not consume all the possible bits the
     random_bits() call, but only part of it.
     What I would do here, is to reduce BASE_62_DIGITS from 10 to 6.
     So that in each round of the loop, 6 base-62 digits are consumed
     and more than 4 base-62 digits are left in v, for predictability.
     In the usual calls with 6 'X'es the loop will still end after a
     single round.

(Q2) First of all, the multiple gen_tempname calls can occur in
     different threads. Since no locking is involved, it is undefined
     behaviour to access the 'prev_v' variable from different threads.
     On machines with an IA-64 CPU, the 'prev_v' variable's value may
     not be propagated from one thread to the other. [3][4][5]
     The fix is simple, though: Mark 'prev_v' as 'volatile'.

     Then, what the code does, is a mix of (A), (B), (C). Again, this
     is *not* a linear congruential generator, therefore the expected
     quality is BAD.

     To get good quality, I would suggest to use a linear congruential
     generator across *all* gen_tempname calls of the entire thread.
     This means:
       - Move out the (B) invocations out, like explained above in (Q1).
       - Remove the (C) code that you added last week.
       - Store the v value in a per-thread variable. Using '__thread'
         on glibc systems and a TLS variable (#include "glthread/tls.h")
         on the other platforms.

(Q3) Here, to force differences between different processes, I would
     suggest to use a fine-grained clock value. In terms of platforms,
       #if defined CLOCK_MONOTONIC && HAVE_CLOCK_GETTIME
     is way too restrictive.
     How about
       - using CLOCK_REALTIME when CLOCK_MONOTONIC is not available,
       - using gettimeofday() as fallback, especially on native Windows.

If one does (Q3), then the suggestions for (Q2) (other than the 'volatile')
may not be needed.

Bruno

[1] https://en.wikipedia.org/wiki/Linear_congruential_generator
[2] https://en.wikipedia.org/wiki/Linear_congruential_generator#c_%E2%89%A0_0
[3] https://es.cs.uni-kl.de/publications/datarsg/Geor16.pdf
[4] https://db.in.tum.de/teaching/ws1718/dataprocessing/chapter3.pdf?lang=de
    page 18
[5] https://os.inf.tu-dresden.de/Studium/DOS/SS2014/04-Memory-Consistency.pdf








reply via email to

[Prev in Thread] Current Thread [Next in Thread]