octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #60984] Load function on Windows is significan


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #60984] Load function on Windows is significantly slower than on Linux
Date: Sat, 31 Jul 2021 05:26:17 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36 Edg/92.0.902.55

Follow-up Comment #11, bug #60984 (project octave):

I attached Very Sleepy CS as a profiler to versions of Octave with and without
the changes from comment #9 while running `load('A.txt')`.
The execution takes notably longer when the profiler is attached. But I hope
the results of the two runs are still comparable.
Taking only "valid" contributors from the results (i.e., discarding time spend
in "wait" functions), I see the following top contributors without the
changes:

setlocale       27.86s  27.86s  4.33%   4.33%   msvcrt  [unknown]       0       
0x7fff1cf09bd7
lseeki64        3.60s   3.60s   0.56%   0.56%   msvcrt  [unknown]       0       
0x7fff1cefd35c
ReadConsoleOutputW      0.99s   0.99s   0.15%   0.15%   KERNELBASE      
[unknown]       0       0x7fff1c4dc188
_cxa_demangle   0.57s   1.21s   0.09%   0.19%   libstdc++-6     [unknown]       
0       0x6fc4b665
read    0.41s   0.41s   0.06%   0.06%   msvcrt  [unknown]       0       
0x7fff1cefe52b

And with the changes:

setlocale       6.94s   6.94s   1.93%   1.93%   msvcrt  [unknown]       0       
0x7fff1cf09bb8
lseeki64        3.76s   3.76s   1.04%   1.04%   msvcrt  [unknown]       0       
0x7fff1cefd35c
ReadConsoleOutputW      0.81s   0.81s   0.23%   0.23%   KERNELBASE      
[unknown]       0       0x7fff1c4dc188
_cxa_demangle   0.78s   1.94s   0.22%   0.54%   libstdc++-6     [unknown]       
0       0x6fc4b680
free    0.41s   0.41s   0.11%   0.11%   msvcrt  [unknown]       0       
0x7fff1cef9c9c
malloc  0.31s   0.31s   0.09%   0.09%   msvcrt  [unknown]       0       
0x7fff1cef9d40


It looks like `setlocale` is still the top contributor.

The call stacks presented by Very Sleepy CS that end in `setlocale` look the
same for both versions. The top of these stacks (35 before and 27 total after
the change) looks like this according to Very Sleepy CS:

setlocale       msvcrt  [unknown]       0       0x7fff1cf09c3d
ZSt14__convert_to_vIdEvPKcRT_RSt12_Ios_IostateRKPi      libstdc++-6     
[unknown]       0       0x6fd31964
ZNKSt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE6do_getES3_S3_RSt8ios_baseRSt12_Ios_IostateRd
       libstdc++-6     [unknown]       0       0x6fc95016
ZNSi10_M_extractIdEERSiRT_      libstdc++-6     [unknown]       0       
0x6fcb7bce
ZN6octave12rand_poissonIfEEvT_xPS1_     liboctave-8     [unknown]       0       
0x6d27456a
ZrsRSiR6Matrix  liboctave-8     [unknown]       0       0x6c9e9a60
ZN13octave_matrix10load_asciiERSi       liboctinterp-9  [unknown]       0       
0x6e941f8b
Z14read_text_dataRSiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERbR12octave_valuexb
   liboctinterp-9  [unknown]       0       0x6ed23898
ZN6octave16load_save_system9load_varsERSiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_16load_save_formatENS_9mach_info12float_formatEbbbRK13string_vectoriii
      liboctinterp-9  [unknown]       0       0x6ecf9a97
ZN6octave16load_save_system4loadERK17octave_value_listi liboctinterp-9  
[unknown]       0       0x6ecfcc84
Z5FloadRN6octave11interpreterERK17octave_value_listi    liboctinterp-9  
[unknown]       0       0x6ecfd7c5
ZN14octave_builtin7executeERN6octave14tree_evaluatorEiRK17octave_value_list     
liboctinterp-9  [unknown]       0       0x6e8c4049
ZN15octave_function4callERN6octave14tree_evaluatorEiRK17octave_value_list       
liboctinterp-9  [unknown]       0       0x6e900c4e

Before the change, one of those stacks was attributed the huge majority of the
timing ("top" 4 stacks: ~18 s, ~2.5 s, ~1.8 s, ~1.3 s).
After the change, the "top" 3 stacks contributed the majority (~1.4 s, ~1.3 s,
~1.0 s).
The remaining stacks contributed a few 10 to a few 100 ms each.
I couldn't identify the stack that corresponds to the newly added code. That
is strange.
Could it be that compiler optimizations are removing that code completely? But
then, why are we seeing different timings with and without the change?
Does `std::setlocale` not call `setlocale`?
IIUC, `setlocale` should have been called twice for each read number (i.e.,
1000*1000*2 = 2 million times). That's by far more than the number of stacks
that the output of Very Sleepy CS shows. I'm only guessing that Very Sleepy CS
grouped some of those very many stacks together. It might have (erroneously)
included the call with the differing stack into one of these groups.


As the next largest contributor, the top of the call stacks (total of 2 before
and after the change) that end in `lseeki64` looks like this:

lseeki64        msvcrt  [unknown]       0       0x7fff1cefd35c
ZNSt12__basic_fileIcE7seekoffExSt12_Ios_Seekdir libstdc++-6     [unknown]       
0       0x6fcdf242
ZNSt13basic_filebufIcSt11char_traitsIcEE7seekoffExSt12_Ios_SeekdirSt13_Ios_Openmode
     libstdc++-6     [unknown]       0       0x6fce85f5
ZNSi5tellgEv    libstdc++-6     [unknown]       0       0x6fcb94d5
ZN6octave12rand_poissonIfEEvT_xPS1_     liboctave-8     [unknown]       0       
0x6d274451
ZrsRSiR6Matrix  liboctave-8     [unknown]       0       0x6c9e9a60
ZN13octave_matrix10load_asciiERSi       liboctinterp-9  [unknown]       0       
0x6e941f8b
Z14read_text_dataRSiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERbR12octave_valuexb
   liboctinterp-9  [unknown]       0       0x6ed23898
ZN6octave16load_save_system9load_varsERSiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKNS_16load_save_formatENS_9mach_info12float_formatEbbbRK13string_vectoriii
      liboctinterp-9  [unknown]       0       0x6ecf9a97
ZN6octave16load_save_system4loadERK17octave_value_listi liboctinterp-9  
[unknown]       0       0x6ecfcc84
Z5FloadRN6octave11interpreterERK17octave_value_listi    liboctinterp-9  
[unknown]       0       0x6ecfd7c5
ZN14octave_builtin7executeERN6octave14tree_evaluatorEiRK17octave_value_list     
liboctinterp-9  [unknown]       0       0x6e8c4049
ZN15octave_function4callERN6octave14tree_evaluatorEiRK17octave_value_list       
liboctinterp-9  [unknown]       0       0x6e900c4e
ZN6octave14tree_evaluator15remove_autoloadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_
    liboctinterp-9  [unknown]       0       0x6e9f43e5

The time attributed to one of those stacks was notably longer than the other
(approx. 3.6 s compared to approx. 0.05 s for the other one in both
versions).

I'm not sure how `octave::rand_poisson` made it into these stacks. The only
command that should have been running while the profiler was attached was
`load`. I don't understand how that would call `octave::rand_poisson`. So, I'm
not sure how much we can trust these stacks as a whole.


When loading integer numbers in ASCII format, are we reading them as floating
point numbers? Afaict, instantiations of `__convert_to_v` are only called when
reading `float`, `double`, or `long double` type numbers.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60984>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]