bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: make cannot handle prerequisites that contain a colon


From: Paul D. Smith
Subject: Re: make cannot handle prerequisites that contain a colon
Date: Tue, 19 Oct 2004 15:31:31 -0400

%% Markus Kuhn <address@hidden> writes:

  mk> Are colon and space really the only bytes that cannot be handled
  mk> in a filename by make? How would you handle a pathologic filename
  mk> that contains bytes such as 0xa0?

Well, filenames that contain newlines can't be represented either.

Make uses standard char* strings to hold all its data, not unsigned and
not wchar_t.  But, it doesn't do much with these strings except chop
them up on whitespace, compare them, and send them to programs it
invokes.

So, offhand I can't see any reason why higher-bit set characters would
be a problem.

  mk> So it would be good to have a brief section in the manual that
  mk> explains, which characters are exactly allowed in file names to be
  mk> processed by GNU make.

True, but that would require someone testing to see which ones work :-).

In other words, there's nothing in GNU make that actually looks at
target names to make sure they contain only a valid set of characters.
If there _ARE_ restrictions then they exist as a side-effect of
processing by some other area of GNU make.

  mk> I guess, it would have to move from handling just strings to
  mk> handling arrays of fully 8-bit transparent strings, more like what
  mk> bash, tcl, or perl do.

Well, it's pretty complicated when you start looking at the details of
how such a thing could work, given the free-form syntax of makefiles.

  >> Actually I had one idea that could be implemented without redoing all of
  >> make's internals, but it would block off at least one and probably two
  >> or more different 8-bit values from appearing in makefiles.  In an i18n
  >> world I don't know if this is acceptable.

  mk> The i18n world is now fairly quickly moving towards using UTF-8,
  mk> and UTF-8 strings have the useful property that the bytes 0xfe and
  mk> 0xff are never used by the encoding. Other than that, using bytes
  mk> in the 0x01-0x1f range may also be acceptable, because none of the
  mk> ASCII-compatible character encodings used worldwide uses any of
  mk> these to represent a graphicval character. (Well, there is VSCII-1
  mk> in Vietnam, but hardly anyone really uses that under Unix, as it
  mk> causes endless problems and has de-facto already been superceeded
  mk> by UTF-8.)

That's good to know; one of the main problems I've had with trying to
find a solution to the issue is not having any personal understanding of
how the various extended character sets actually work and what would be
possible or not possible with them.


I'm not really sure what would be involved with providing support for
full UTF-8 character sets in make; I'd have to think about it more.


The idea I had involves changing escaped special characters like spaces
into "impossible" byte values in make's internal string representation.
That way all of make's current manipulation would continue to work
as-is: when searching for whitespace to break up words for example it
would not see the "impossible" byte values as whitespace, so it wouldn't
break on that character.

Then, at the very last minute before make invokes a commandline, etc. it
would un-translate the string and replace the impossible bytes with the
special characters again.

Of course, there are many details to work out, such as the user
interface for escaping special characters, exactly when the translation
back is done, etc.

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <address@hidden>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.paulandlesley.org
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist




reply via email to

[Prev in Thread] Current Thread [Next in Thread]