Re: glibc segfault on "special" long double values is _ok

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: glibc segfault on "special" long double values is _ok_!?

From:	James Youngman
Subject:	Re: glibc segfault on "special" long double values is _ok_!?
Date:	Fri, 8 Jun 2007 09:53:24 +0100

On 6/8/07, Nix <address@hidden> wrote:

It's somewhat unusual for applications to accept double-format data over
the network or from files; but modulo byte-swapping, has anyone *ever*
seen an application that checks to be sure that the data it's received
is a valid IEEE754 floating-point number? I've never seen any such app,
I've never heard of anyone taking precautions under the assumption that
a double with a one-bit error (I think it's one bit, I've lost the start
of this thread) may cause core dumps if printed, and I've never
considered doing any such thing myself. It's generally assumed that
printing doubles is safe, no matter their origin.


The use case I was thinking about as I wrote my earlier email is of
massively parallel HPC across large compute clusters.  Here is the
basic approach:

1. Buy some 500-port, very high bandwidth, medium latency, network
switches (Myrinet, Gig-Ethernet, whatever).
2. Plug a big pile of machines in.
3. Perform gigantic parallel calculations
4. Exchange numeric data between nodes during the computation

So, having spent the $xxM on the gigantic switches, to get better
aggregate bandwidth, do we prefer to format the data as ASCII before
we exchange it between nodes?   Not really.   Do we format the data as
ASCII before we store the end result of the computation?   You bet,
but that's a different issue.

Can the network infrastructure corrupt bits in the exchanged data?
Yes.  Not often, but it does happen.  Same for the RAM.  So what do we
do when we detect a problem?  Print debugging messages, as Nix already
said (we work in, afaik, unrelated organisations).   Obviously some of
the diagnostics only get issued when we already know there is a
problem.   When we're producing diagnostics, we prefer that the bad
data we're trying to complain about can be logged somehow.

Could we just print the raw bytes as hex or something?  Sure, but then
we'd need to interpret that anyway.  The days of manually poring over
core dumps that came out of the line printer shuld be behind us these
days.

I'd say this behaviour violates the principle of least astonishment, at
least. Mind you, avoiding it does seem like it could be expensive: [...]


Maybe.  For the issue-diagnostic-message use case, performance is not
such an issue.  But I'm sure there are valid use cases where ultimate
performance is really vital.  Use-cases vary a lot.

James.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: glibc segfault on "special" long double values is _ok_!?, (continued)

Prev by Date: arch-independent glibc printf segfault for "special" long double values
Next by Date: Re: glibc segfault on "special" long double values is _ok_!?
Previous by thread: Re: glibc segfault on "special" long double values is _ok_!?
Next by thread: Re: glibc segfault on "special" long double values is _ok_!?
Index(es):
- Date
- Thread