octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new file, oct-mem.h


From: Jaroslav Hajek
Subject: Re: new file, oct-mem.h
Date: Thu, 5 Nov 2009 21:19:40 +0100

On Tue, Nov 3, 2009 at 9:33 PM, John W. Eaton <address@hidden> wrote:
> In the new file liboctave/oct-mem.h:
>
>  // Fill by value, with a check for zero. This boils down to memset if value 
> is
>  // a POD zero.
>  template <class T>
>  inline void octave_fill (octave_idx_type n, const T& value, T *dest)
>  { std::fill_n (dest, n, value); }
>
>  template <class T>
>  inline bool octave_fill_iszero (const T& value)
>  { return value == T(); }
>
>  template <class T>
>  inline bool octave_fill_iszero (const std::complex<T>& value)
>  { return value.real () == T() && value.imag () == T(); }
>
>  template <class T>
>  inline bool octave_fill_iszero (const octave_int<T>& value)
>  { return value.value () == T(); }
>
>  #define DEFINE_POD_FILL(T) \
>  inline void octave_fill (octave_idx_type n, const T& value, T *dest) \
>  { \
>    if (octave_fill_iszero (value)) \
>      std::memset (dest, 0, n * sizeof (T)); \
>    else \
>      std::fill_n (dest, n, value); \
>  }
>
> These rely on zero-valued floating point numbers having all bits zero,
> which is not guaranteed by C/C++.  But it is guaranteed by the IEEE
> 754 format.  I don't think it is a bad thing to require IEEE 754 (many
> things in Octave won't work properly without IEEE floating point
> math), but maybe we should state that assumption clearly with a
> configure test?  Oh, OK, this requirement is more or less enforced now
> in octave_ieee_init.  So maybe this is OK as it is, though I guess I
> would prefer to have a comment stating the assumption here, and perhaps
> also an easy way to disable this optimization if someone wanted to
> experiment with Octave on a system with a different floating point
> format.
>
>  // Uninitialized allocation. Will not initialize memory for complex and 
> octave_int.
>  // Memory allocated by octave_new should be freed by octave_delete.
>  template <class T>
>  inline T *octave_new (octave_idx_type n)
>  { return new T[n]; }
>  template <class T>
>  inline void octave_delete (T *ptr)
>  { delete [] ptr; }
>
>  #define DEFINE_POD_NEW_DELETE(T) \
>  template <> \
>  inline T *octave_new<T > (octave_idx_type n) \
>  { return reinterpret_cast<T *> (new char[n * sizeof (T)]); } \
>  template <> \
>  inline void octave_delete<T > (T *ptr) \
>  { delete [] reinterpret_cast<char *> (ptr); }
>
> Maybe a better name for this function would be "uninitialized_new" or
> "no_ctor_new" or something similar that states more clearly what the
> intent is?  Otherwise, I think it will be easy to confuse them as just
> being wrappers around new/delete.
>
> jwe
>

OK, I renamed the functions to more descriptive names: no_ctor_new,
no_ctor_delete, copy_or_memcpy, fill_or_memset. I also changed the
test for zero to use reinterpret_cast to a suitable unsigned integer
type, so that it is does not actually rely on IEEE, even though it may
be unnecessary. Here, it makes good sense because what one actually
wants to test for is whether the fill-in value has zero memory pattern
(which is typically true when arrays are resized).

These changes speed up some indexing, indexed assignment and permuting
for integer & complex types: memcpy is used instead of plain loops and
memory is not uselessly zeroed after allocation. For single & double
real matrices, sometimes no speed-up is visible, sometimes some 30%.
The C++ standard library supplied with GCC optimizes std::copy to
memmove for POD types. So it appears that using memmove to do a
non-overlapping memory copy is sometimes equally fast as memcpy, but
sometimes slower. It seems really interesting. In any case, indexing
is somewhat more efficient again...

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]