Why does adding a useless copy-sequence and discarding the result make m

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Why does adding a useless copy-sequence and discarding the result make m

From:	Clément Pit-Claudel
Subject:	Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?
Date:	Sat, 25 Mar 2017 00:06:18 -0400
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

Hi all,

I can't make sense of the following three observations: adding a useless 
(copy-sequence …) and discarding the result makes my code roughly 20-times 
faster, sending strings of length 2^16+1 to a subprocess is several times 
slower than sending strings of length 2^16, and sending 2^6 strings of length 
2^10 is much faster than sending a single string of length 2^16.

Here's my function (read.py is a python script containing just "import sys; 
sys.stdin.read()"):

(defun bench (cargo-cult size)
  (let* ((message (make-string size ?a))
         (process-connection-type nil)
         (proc (start-process "reader" nil "python2" "read.py")))
    (dotimes (_ 100)
      (when cargo-cult
        (copy-sequence message))
      (process-send-string proc message))
    (kill-process proc)))

And here's the experimental data:

(benchmark-run 1 (bench t 65536))   ;; (0.048931082 0 0.0)
(benchmark-run 1 (bench nil 65536)) ;; (1.844878298 0 0.0)

(benchmark-run 1 (bench t 65537))   ;; (2.028768849 0 0.0)
(benchmark-run 1 (bench nil 65537)) ;; (1.175113209 0 0.0)

>From this we learn that:

(1) Setting cargo-cult to *t* and running a useless `(copy-sequence message)' 
makes the code roughly 40 times *faster* on strings of length 65536.
(2) Adding one to the string length yields horrible performance (3.2MB/s), but 
at least the version without the useless copies is faster.

What's going on here?  strace suggests that the second benchmark is spending 
98% of its time in pselect6 calls.

I confirmed that this isn't dependent on the version of Emacs in use (the same 
issue appears in Emacs 24.5).  It is not dependent on the program on the 
receiving end of the pipe, either (I tried with cat, read.py as above, and a 
minimal C program (attached)).  Replacing copy-sequence with concat preserves 
the surprising behavior, but using (ignore message) makes it disappear 
(everything is slow).  There are no transmission issues: the receiving side 
does read a string of "a"s of the right length.  Using a program other than 
Emacs to send the same strings doesn't show these performance issues.

Here is a bit more data, generated using the attached loop.sh and 
send-string-cli.el

;; (benchmark-send-string CARGO-CULT N-INNER-LOOP STRING-SIZE READ-PROGRAM) → 
BENCHMARK-RESULT
(benchmark-send-string t   200  1024  "python2" "read.py") → (0.067296848       
  1   0.004180474)
(benchmark-send-string nil 200  1024  "python2" "read.py") → (0.061934555       
  0   0.0)
(benchmark-send-string t   200  16384 "python2" "read.py") → (0.159620501       
  12  0.04479207899999999)
(benchmark-send-string nil 200  16384 "python2" "read.py") → (1.556031994       
  0   0.0)
(benchmark-send-string t   200  65536 "python2" "read.py") → (0.334425146       
  46  0.16486607800000005)
(benchmark-send-string nil 200  65536 "python2" "read.py") → 
(3.4018179170000002  0   0.0)
(benchmark-send-string t   200  65537 "python2" "read.py") → (12.418711961      
  46  0.239757055)
(benchmark-send-string nil 200  65537 "python2" "read.py") → (6.102755695       
  0   0.0)
(benchmark-send-string t   200  1024  "cat")               → (0.10330761        
  1   0.004161472999999999)
(benchmark-send-string nil 200  1024  "cat")               → 
(0.12250944600000001 1   0.0035327989999999997)
(benchmark-send-string t   200  16384 "cat")               → 
(1.3808745009999999  24  0.09748934199999998)
(benchmark-send-string nil 200  16384 "cat")               → (2.521698033       
  12  0.05619970500000001)
(benchmark-send-string t   200  65536 "cat")               → (4.120669224       
  99  0.4042515129999999)
(benchmark-send-string nil 200  65536 "cat")               → (9.412745340999999 
  50  0.218435563)
(benchmark-send-string t   200  65537 "cat")               → (12.176100455      
  99  0.502071993)
(benchmark-send-string nil 200  65537 "cat")               → (6.322310322000001 
  51  0.22764341199999993)
(benchmark-send-string t   200  1024  "read")              → (0.026363418       
  1   0.004118800999999998)
(benchmark-send-string nil 200  1024  "read")              → (0.061907866       
  0   0.0)
(benchmark-send-string t   200  16384 "read")              → (0.091360628       
  12  0.038132967000000004)
(benchmark-send-string nil 200  16384 "read")              → (0.12431489        
  0   0.0)
(benchmark-send-string t   200  65536 "read")              → 
(0.18765710700000002 46  0.14072835099999995)
(benchmark-send-string nil 200  65536 "read")              → (3.397287521       
  0   0.0)
(benchmark-send-string t   200  65537 "read")              → 
(12.389248452999999  46  0.257472908)
(benchmark-send-string nil 200  65537 "read")              → (3.258524461       
  0   0.0)
(benchmark-send-string t   1600 8192  "read")              → (0.258897164 49 
0.17045154200000004)
(benchmark-send-string nil 1600 8192  "read")              → (0.07095391 0 0.0)

This shows one more thing:

(3) Sending 1600 strings of 8192 characters is much faster than sending 200 
strings of length 65536.

I managed to reproduce these results on two different physical GNU/Linux 
machines.  Their configurations are:

    GNU Emacs 26.0.50 (build 9, x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 
2017-03-11 built on clem-w50-mint
    Repository revision: fe68818be218537d341365c014f1a6234c9705d6
    Windowing system distributor 'The X.Org Foundation', version 11.0.11804000
    System Description: Linux Mint 18.1 Serena

    In GNU Emacs 26.0.50.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 
2017-02-08 built on goodsight
    Repository revision: eff901b8a39f42ddedf4c1db833b9071cae5962f
    System Description: Ubuntu 16.04.2 LTS

Can anyone reproduce these issues, and help shed light on them? I'm trying to 
understand three things:

* Why does running additional, supposedly useless code speed things up so much? 
(1)
* Why does sending strings longer 2**16 characters slow things down so much? (2)
* What is the right way to get reliable performance from process-send-string? 
(3)

Thanks!
Clément.

loop.sh
Description: application/shellscript

read.c
Description: Text Data

read.py
Description: Text Data

send-string-cli.el
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Clément Pit-Claudel <=
- Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Clément Pit-Claudel, 2017/03/25
  - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Eli Zaretskii, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Andreas Politz, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Clément Pit-Claudel, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Andreas Politz, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Clément Pit-Claudel, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Eli Zaretskii, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Andreas Politz, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Eli Zaretskii, 2017/03/25
    - Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?, Andreas Politz, 2017/03/25

Prev by Date: Re: chmod update-game-score now causes fatal error
Next by Date: Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?
Previous by thread: Re: scratch/record 718fd8d: Make type-of return the type, even for old-style structs
Next by thread: Re: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?
Index(es):
- Date
- Thread