bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] IBM z/OS + EBCDIC support


From: Daniel Richard G.
Subject: Re: [PATCH] IBM z/OS + EBCDIC support
Date: Tue, 22 Sep 2015 15:27:58 -0400

Hi Eric,

On Tue, 2015 Sep 22 09:23-0600, Eric Blake wrote:
>
> Thanks for the work. Can you please split the patch into a series of
> multiple pieces, one patch per issue, so that we can apply the obviously-
> correct ones while still discussing the other pieces, rather than
> holding the entire large patch hostage to review?

Wouldn't it be easier to apply everything to a feature branch, and
integrate it bit by bit? I can split up the patch, but my idea of a
sensible partitioning might not agree with yours...

> Also, while I see you have copyright assignment on file for Gawk, I
> don't see it for gnulib. You'll want to repeat the assignment process
> for gnulib before we can take more than the most trivial patches.

Okay, I've sent in the request form.

> coreutils uses to_uchar() to force the conversion of a byte to an
> unsigned character, useful for cases where sign extension of a byte
> is not desired.  Sounds like it does the same thing as what you are
> doing here.

Kind of, except that character literals may be signed, and thus
potentially have a negative value. to_uchar() could be applied to both
sides, but then that wouldn't work in a "switch" block.

What my macro does, then, is "convert to the value that a matching char
literal would have, if it's not there already."

> > +++ lib/math.in.h
> >
> > * The system defines these functions as macros, and the compiler did
> >   not like seeing them redefined.
>
> No underlying functions with linkage? POSIX generally requires that,
> so you may want to submit a bug, but it's certainly not the first time
> we've worked around that.

It wasn't that the functions had no linkage (though that may or may not
be the case), just that the compiler borked on the macro redefinition.

> > +++ lib/regex.h
> >
> > * Ensure that "__string" does not expand to "1" when it is used as a
> >   formal parameter name.
>
> Sounds like we shouldn't be naming our formal parameter __string,
> since that's a name reserved to the internal implementation namespace.

That would be a better fix, yes. But doesn't this file come from glibc?
Is it feasible to make such a change happen there?

(The Gawk maintainer was reluctant to make local changes to that
project's regex files.)

> > +++ m4/strstr.m4
> >
> > * The IBM runtime sucks; signal delivery is delayed until strstr()
> >   exits, so this test results in a hang that can only be SIGKILL'ed.
>
> Not a hang, just a reallllllly long execution time; and all because
> the libc implementation is O(n^2) instead of O(n).  But they really
> block signals during the call?  Ouch.

Yes, that was one of many forehead-slapping moments in this work :>

And the silly thing is, if you provide your own implementation of
strstr(), it won't have this problem, because it won't be in the
system runtime where signal delivery is verboten!

> > +++ tests/nan.h
> >
> > * z/OS, in addition to supporting IEEE floating-point, also supports
> >   an older "hexadecimal" format that does not support NaN. Bomb out
> >   if this is in use.
>
> C, and POSIX, allow for platforms without NaN (in part because of
> cases like the z/OS non-IEEE mode).  I'm not surprised if we have
> baked in assumptions that don't hold when IEEE is not around.

If you'd like to disable the NaN stuff cleanly when IEEE is not in use,
I'd be happy to help make that happen. But for my purposes, I want the
same floating-point gestalt as all other platforms of interest have, so
I punted on the hex floats.

> >     ASSERT (c_strcasecmp ("turkish", "TURKD¬SH") < 0);
> >     ASSERT (c_strcasecmp ("TURKD¬SH", "turkish") > 0);
> > 
> >   which, of course, fail.
>
> Basically, EBCDIC lacks the Turkish i, and since it is not a UTF-8
> locale, we should probably be skipping the test in that environment.

I see no harm in checking for unexpected-UTF-8 behavior; it's just the
fact that this is not ASCII that is throwing things off.

> > +++ tests/test-canonicalize-lgpl.c
> >
> > * Addressed a strange z/OS corner case. This system has
> >   DOUBLE_SLASH_IS_DISTINCT_ROOT, yet the dev/ino numbers are the
> >   same.
>
> What? Does that mean 'ls -a /' and 'ls -a //' see different contents?
> If they do, then sharing dev/ino is a bug; if they are identical, then
> DOUBLE_SLASH_IS_DISTINCT_ROOT is defined incorrectly.

Well... those two "ls" commands give the same output, but the
double-slash is used within z/OS Unix System Services to refer to a
different sort of file space:

    
http://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxa500/mvsds.htm?lang=en

Some commands support that syntax, but not all. As for what the
configure test does...

    $ ls -di / //
        1 /       1 //
    $ wc /dev/null
          0       0       0    /dev/null
    $ wc //dev/null
    wc: file "//dev/null": EDC5047I An invalid file name was specified as a 
function parameter.

And yet...

    $ cd //dev
    $ ls -l null
    crwxrwxrwx   1 BPXROOT  SYS1       4,  0 Sep  3  2013 null

It's not clear exactly how this "alternate root" is implemented---
possibly by intercepting pathnames in open(). Might be worth
special-casing the DOUBLE_SLASH test for this platform...


--Daniel


-- 
Daniel Richard G. || address@hidden
My ASCII-art .sig got a bad case of Times New Roman.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]