octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59706] Avoid "canonicalize_file_name" on Wind


From: Rik
Subject: [Octave-bug-tracker] [bug #59706] Avoid "canonicalize_file_name" on Windows
Date: Mon, 21 Dec 2020 12:22:17 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36

Follow-up Comment #15, bug #59706 (project octave):

In terms of performance, and possibly clarity, would it be better to use
push_back() rather than += operator for strings in this context:


          if ((full_file_name.length () == 2 && full_file_name[1] == ':')
              || pat.is_match (full_file_name))
            full_file_name += '\\';


It is true that the '+' operator has an overload for the prototype operator +
(string, char) so maybe there is no performance difference.

Also, performing a regular expression search is always going to be slow, and
most of the time it will be unnecessary.  Maybe add a quick test to check
whether the first two characters are `\` which is a requirement for a UNC name
before looking for the full pattern.  For example


if (full_file_name.length () > 4 && full_file_name[0] == '``' &&
full_file_name[1] == '``' && pat.is_match (full_file_name))
  full_file_name += '\\';


There is also a FIXME note in the code


          // FIXME: Does this pattern match all possible UNC roots?


I don't think it does capture everything.  For one thing, it appears that the
text between backslashes is not optional.  There must be at least one
character which means that '*' is not appropriate and '+' should be used.

Also, the servername is allowed to be specified as an IP address which means a
form like "10.1.1.1" is allowed.  The '\w' character lass does not include the
period '.' character so that should be added.  I am using documentation for
the UNC format from
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/62e862f4-2a51-452e-8eeb-dc4ff5ee33cc.

The definition of the UNC format is


 UNC                = "\\" host-name "\" share-name  [ "\" object-name ]
 host-name          = "[" IPv6address "]" / IPv4address / reg-name  
    ; IPv6address, IPv4address, and reg-name as specified in [RFC3986] 
 share-name         = 1*80pchar
 pchar              = %x20-21 / %x23-29 / %x2D-2E / %x30-39 / %x40-5A /
%x5E-7B / %x7D-FF  
 object-name        = *path-name [ "\" file-name ]
 path-name          = 1*255pchar
 file-name          = 1*255fchar [ ":" stream-name [ ":" stream-type ] ]
 fchar              = %x20-21 / %x23-29 / %x2B-2E / %x30-39 / %x3B / %x3D /
%x40-5B / %x5D-7B / %x7D-FF 
 stream-name        = *schar
 schar              = %x01-2E / %x30-39 / %x3B-5B /%x5D-FF
 stream-type        = 1*schar


So, a modest improvement to the regexp would be


regexp pat (R"(^\\\\[\w.-]+\\[\w\$-]+$)")


This would catch IP4 addresses, but still not IP6 addresses.  It would also
catch common share names such as "admin$".

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59706>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]