octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #61839] fputs() + fdisp() do not use the fopen


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #61839] fputs() + fdisp() do not use the fopen() character encoding
Date: Sun, 27 Feb 2022 11:25:20 -0500 (EST)

Update of bug #61839 (project octave):

                  Status:                    None => Confirmed              
        Operating System:       Microsoft Windows => Any                    

    _______________________________________________________

Follow-up Comment #2:

This was pretty forward to implement for `fputs`. I pushed a change to the
default branch here:
https://hg.savannah.gnu.org/hgweb/octave/rev/76398dfe2d55


However, it will be surprisingly complex to get this right for `fdisp`. IIUC,
that function comes down to a call of the `print_raw` function of *all*
octave_base_value types (in Octave core or any user code deriving directly or
indirectly from octave_base_value). That's a lot.
Even if we would limit the change to character matrices in a first step, I
don't see atm how we could implement this without a lot of refactoring.
Ultimately, character matrices are printed here (for `fdisp`):
https://hg.savannah.gnu.org/hgweb/octave/file/ba07f81c8480/libinterp/corefcn/pr-output.cc#l2623

void
octave_print_internal (std::ostream& os, const charMatrix& chm,
                       bool pr_as_read_syntax,
                       int /* FIXME: extra_indent */,
                       bool pr_as_string)
{
  if (pr_as_string)
    {
      octave_idx_type nstr = chm.rows ();

      if (pr_as_read_syntax && nstr > 1)
        os << "[ ";

      if (nstr != 0)
        {
          for (octave_idx_type i = 0; i < nstr; i++)
            {
              octave_quit ();

              std::string row = chm.row_as_string (i);

              if (pr_as_read_syntax)
                {
                  os << '"' << octave::undo_string_escapes (row) << '"';

                  if (i < nstr - 1)
                    os << "; ";
                }
              else
                {
                  os << row;

                  if (i < nstr - 1)
                    os << "\n";
                }
            }
        }

      if (pr_as_read_syntax && nstr > 1)
        os << " ]";
    }
  else
    {
      os << "sorry, printing char matrices not implemented yet\n";
    }
}


But at that point, we already went through a lot of indirections. At that
point in the code, it'll be difficult how to get the encoding that was set
when "fopen"ing the file.

Even if we knew that encoding here, it would probably not be enough to convert
only `row` to the output encoding. That might be enough for "well-behaved"
encodings which implement ASCII characters at the same codepoint. But for
"exotic" encodings, every part of the string would need to be converted.

Like already mentioned, that would need to be done for *all* octave_base_value
classes.

At that point, it might be better to replace the `std::ostream` with something
that does the conversion "on-the-fly". I experimented with
`std::wbuffer_convert` and `std::codecvt_byname` (see attached diff). But the
latter seems to fail for virtually any encoding I tested with:

terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid


Maybe, we'd need to create our own encoding Facet on top of "uniconv"?



(file #52940)
    _______________________________________________________

Additional Item Attachment:

File name: bug61839-fdisp-encoding.patch  Size:1 KB
   
<https://file.savannah.gnu.org/file/bug61839-fdisp-encoding.patch?file_id=52940>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?61839>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]