[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" a

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" a

From:	Andrew Janke
Subject:	[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?
Date:	Wed, 10 Jun 2020 09:58:44 -0400 (EDT)
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:73.0) Gecko/20100101 Firefox/73.0

Follow-up Comment #8, bug #57596 (project octave):

Pretty sure we need to sort out what's going to happen with UTF-8 and char
semantics before doing this one. But otherwise, it's not that bad:

The character-wise behavior for composite characters and the like is pretty
well established by other languages and the Unicode standard, though:
- If you're doing characterwise UTF-8, then "one character" is one Unicode
code point, however many bytes that's encoded as.
- If you want to be Matlab-compatible and are doing UCS-2, then "one
character" is always one two-byte UCS-2 code unit/code point
- If you're doing UTF-16, "one character" should probably be one two-byte
UTF-16 code unit, not one Unicode code point.
- A Unicode combining character is still technically just one character and
one Unicode code point; you don't have to treat them specially at the I/O
level. It's up to the application code to determine the semantics of sequences
of characters that involve combining characters.

And if you encounter an invalid byte sequence, then I think you should, and
pretty much have to, either throw an error, or convert to the Unicode
"replacement character", and this behavior should probably be
caller-configurable on a per-filehandle basis, and throwing an error should
probably be the default.

UCS-2 has no invalid byte sequences.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57596>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?, Rik, 2020/06/09
- [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?, Markus Mützel, 2020/06/10
  - [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?, Andrew Janke <=
    - [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?, Markus Mützel, 2020/06/10
    - [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?, Andrew Janke, 2020/06/10

Prev by Date: [Octave-bug-tracker] [bug #57591] Segmentation faults when running the test suite (mostly with clang)
Next by Date: [Octave-bug-tracker] [bug #57591] Segmentation faults when running the test suite (mostly with clang)
Previous by thread: [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?
Next by thread: [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?
Index(es):
- Date
- Thread