[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #61839] fputs() + fdisp() do not use the fopen
From: |
Markus Mützel |
Subject: |
[Octave-bug-tracker] [bug #61839] fputs() + fdisp() do not use the fopen() character encoding |
Date: |
Sun, 27 Feb 2022 11:25:20 -0500 (EST) |
Update of bug #61839 (project octave):
Status: None => Confirmed
Operating System: Microsoft Windows => Any
_______________________________________________________
Follow-up Comment #2:
This was pretty forward to implement for `fputs`. I pushed a change to the
default branch here:
https://hg.savannah.gnu.org/hgweb/octave/rev/76398dfe2d55
However, it will be surprisingly complex to get this right for `fdisp`. IIUC,
that function comes down to a call of the `print_raw` function of *all*
octave_base_value types (in Octave core or any user code deriving directly or
indirectly from octave_base_value). That's a lot.
Even if we would limit the change to character matrices in a first step, I
don't see atm how we could implement this without a lot of refactoring.
Ultimately, character matrices are printed here (for `fdisp`):
https://hg.savannah.gnu.org/hgweb/octave/file/ba07f81c8480/libinterp/corefcn/pr-output.cc#l2623
void
octave_print_internal (std::ostream& os, const charMatrix& chm,
bool pr_as_read_syntax,
int /* FIXME: extra_indent */,
bool pr_as_string)
{
if (pr_as_string)
{
octave_idx_type nstr = chm.rows ();
if (pr_as_read_syntax && nstr > 1)
os << "[ ";
if (nstr != 0)
{
for (octave_idx_type i = 0; i < nstr; i++)
{
octave_quit ();
std::string row = chm.row_as_string (i);
if (pr_as_read_syntax)
{
os << '"' << octave::undo_string_escapes (row) << '"';
if (i < nstr - 1)
os << "; ";
}
else
{
os << row;
if (i < nstr - 1)
os << "\n";
}
}
}
if (pr_as_read_syntax && nstr > 1)
os << " ]";
}
else
{
os << "sorry, printing char matrices not implemented yet\n";
}
}
But at that point, we already went through a lot of indirections. At that
point in the code, it'll be difficult how to get the encoding that was set
when "fopen"ing the file.
Even if we knew that encoding here, it would probably not be enough to convert
only `row` to the output encoding. That might be enough for "well-behaved"
encodings which implement ASCII characters at the same codepoint. But for
"exotic" encodings, every part of the string would need to be converted.
Like already mentioned, that would need to be done for *all* octave_base_value
classes.
At that point, it might be better to replace the `std::ostream` with something
that does the conversion "on-the-fly". I experimented with
`std::wbuffer_convert` and `std::codecvt_byname` (see attached diff). But the
latter seems to fail for virtually any encoding I tested with:
terminate called after throwing an instance of 'std::runtime_error'
what(): locale::facet::_S_create_c_locale name not valid
Maybe, we'd need to create our own encoding Facet on top of "uniconv"?
(file #52940)
_______________________________________________________
Additional Item Attachment:
File name: bug61839-fdisp-encoding.patch Size:1 KB
<https://file.savannah.gnu.org/file/bug61839-fdisp-encoding.patch?file_id=52940>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?61839>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #61839] fputs() + fdisp() do not use the fopen() character encoding,
Markus Mützel <=