bug-gmp
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CLN, GMP inputs


From: Hans Aberg
Subject: Re: CLN, GMP inputs
Date: Thu, 3 May 2001 11:22:24 +0200

At 10:54 +1000 2001/05/03, Kevin Ryde wrote:
>I measured one unoptimized dynamic-linked libc malloc on an athlon at
>no less than 700 cycles for malloc+free, or 1200 for
>malloc+realloc+free.  The next gmp will have some code to measure that
>sort of thing better (in the tune/speed.c program).
>
>Of course if the run time for a program is dominated by allocate and
>free then there's something badly wrong in the algorithm or its
>implementation.

The combination of "unboxed" and "boxed" number representations is used in
functional languages, in order make them more competitive with languages
like C/C++. In the latter, an math operation on a words sized number just
takes one or a few cycles.

So the reason to use it is to make an "integer" structure to be a
reasonable substitute for an "int".

There are some ways to make fast allocations, for example if the objects
all have the same size, one can use two singly linked lists inside an
array, one for allocated space and one for free space. It is then fast to
move allocations between the two lists.

With a conservative GC, like a two space copying GC, an allocation is fast:
Just change the stack pointer. When the space is full, all stuff is moved
on to the free half.

But for GMP, it would seem overkill to add something like that. GC
questions are so technical, that it is only for dedicated experts; others
should use GC libraries.

One variation would be to add a ref count and the two singly list mentioned
above for the ref count allocations. Then the time overhead in the C++
variation will not be so big. But then one must figure out what to do when
the array space runs out.

>> The only problem with this approach is that the __mpf_struct has a field
>> using the same name _mp_d as the __mpz_struct, which will screw up the
>> macros.
>
>Ben Hinkle posted this list a while back similarly wanting a single
>malloced block for an mpf.  One idea on that front is to change to
>something like
>
>        typedef struct
>        {
>          int _mp_prec;
>          int _mp_size;
>          mp_exp_t _mp_exp;
>          mp_limb_t _mp_d[];
>        } __mpf_struct;
>
>Which is allocated with enough space for the _mp_d data there in the
>same block.  (An empty [] is a gcc extension I think, but _mp_d[1]
>works just as well.)

I think that an empty [] might be a part of the new C standard (but I do
not know).

>The advantage of that is most sources would work unchanged, _mp_d
>having just changed from a pointer to an array.  Init and clear become
>very different though, and the semantics for applications too.  This
>is not a bad idea for mpf because once a precision is set it doesn't
>need to grow or shrink, but that's not true of mpz and mpq in general.

Do I understand this right: Is the idea to then malloc the whole
__mpf_struct, so that in the allocation _mp_d becomes whatever comes after
the other fields?

One thing that comes to my mind, namely that in C/C++, there is no
guarantee that the fields are allocated in the order they are written, even
though most compilers would do that. But strictly speaking, there is no way
to ensure _mp_d is put last.

The other thing that comes to my mind is field alignment: If int's,
mp_exp_t, and mp_limb_t have different sizes, the word alignment for the
limbs might go off, slowing it down.

Therefore, I arrived at a version where one is only using mp_limb_t
allocation also for the other fields.

>> If I should sum the above up seen from the perspective of GMP development,
>> then what one wants is that all the stuff relating to the dynamic behavior
>> of the __mpz_struct and __mpf_struct ending up in the memory allocation, so
>> that one can avoid having to do another such memory allocation.
>
>I'd recommend making other arrangements to setup the mpz_t parts, if
>that seems to be a problem.  I'm afraid the need for upward binary
>compatibility will prevent any radical changes.

I'm not sure what you mean with the need for "upward binary compatibility":
Is that for use in DLL's? Otherwise, only the inteface would matter.

>Since mpz_t is only small, one possibility would be an array of them.
>A good malloc probably already does that for small blocks.  Otherwise,
>or as well as that, I'd still suggest a free list of pre-initialized
>values, to be handed out as needed.  I'm pretty sure that could work
>well either with a garbage collector or reference counting.

I'm not sure what you think of here: One problem is the initialization
problem, that is:
  integer n; // Initialization knocks on cycles.
  ...
  n = 52365423754367; // Another allocation.
is more time consuming than
  integer n = 52365423754367; // Just one allocation.

But this one find ways around.

I think though that the other two problems are more significant, namely
that say int's compute much faster than similarly sized integer's (mpz_t),
and that in C++, returns such as in
  integer f(A) {
    integer x;
    ...
    return x; // Will most likely invoke the copy constructor.
  }
will be slow.

If one introduces an unboxed representation, then the initialization
problem will be solved. But I am not sure it is worth doing something about
the initialization problem alone.

And some form of GC (ref counts or whatever) will take care of the copy
constructor problem and memory allocation problem.

The question is how GMP best can support that; if one achieves the effects
within GMP or by building something on top, I think matters little.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]