[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: non portable sed scripts

From: Ralf Wildenhues
Subject: Re: non portable sed scripts
Date: Sun, 21 May 2006 21:18:43 +0200
User-agent: Mutt/1.5.11+cvs20060403

Hi Paul,

* Paul Eggert wrote on Sun, May 21, 2006 at 09:46:32AM CEST:
> Ralf Wildenhues <address@hidden> writes:
> > So then the total limit of the script size I found on Solaris (described
> > in that other mail in this thread that was pending for some hours)
> > really is a new issue.
> If it's just Solaris, we should be able to work around it by using
> AC_PROG_SED, as it should check for that bug (it currently doesn't,
> but it should).

I think I have this figured out now, (took me way too long :-( )
but I need a while to write it all down, and I need to go back to
Libtool fix a 5 year old bug (the one that led to LT_AC_PROG_SED
in the first place) in a different (right) way first.

Short story: Libtool has always (wanted to) prefer /usr/xpg4/bin/sed
over /bin/sed on Solaris, stating that the latter doesn't cope as well
with long lines.  Well, it copes worse that the xpg4 one with
_incomplete_ lines (without final newlines), which libtool likes to
create at times.  But test for sed has been "fixed" along the way not to
use incomplete lines, so it wouldn't exclude /bin/sed anyway ...  and
then anyway libtool just needs to put its $NL2SP | $SED | $SP2NL
workaround in place everywhere so that this doesn't matter any more.

Then, /usr/xpg4/bin/sed doesn't really expose a small script length
limit; rather, it segfaults on the CONFIG_HEADERS script created by the
"Torturing config.status" test, but it works with simpler scripts of
the same size.  I have not analyzed in detail the characteristics when
this segfault triggers.

I tested /usr/ucb/sed again.  It turns out, the 6810 bytes for it isn't
fixed.  With a script that your proposed test generates, the border ends
at about 6635 characters.  If you use one less substitution, 6644
characters are ok.  White space before a command does not count, neither
does a `;' separating commands.  Labels (`:' commands) and their
arguments do not count, neither do jumps `b' or conditional jumps `t'.
An escape character (backslash) in a regex does not count.  The limit
cannot be circumvented by splitting the script into several files
(although the length of the representation of 2 scripts may not exactly
be the sum of the lengths of the individual representations; I did not
check that).  For too long scripts, the error message is:
  sed: Too much command text: [...]

My conclusion from these observations is that there is a fixed buffer
size for some internal representation of the command text, which has a
constant overhead per command (possible with a per-command constant),
plus the (internal representation of the) arguments.  I have not
attempted to measure the overhead per `s' command or any other constants
here exactly.

I will post another message with an actual patch, and more technical
comments to it; this one is messy and long enough already.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]