|
From: | Clément Pit-Claudel |
Subject: | Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster? |
Date: | Sat, 25 Mar 2017 00:06:18 -0400 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
Hi all, I can't make sense of the following three observations: adding a useless (copy-sequence …) and discarding the result makes my code roughly 20-times faster, sending strings of length 2^16+1 to a subprocess is several times slower than sending strings of length 2^16, and sending 2^6 strings of length 2^10 is much faster than sending a single string of length 2^16. Here's my function (read.py is a python script containing just "import sys; sys.stdin.read()"): (defun bench (cargo-cult size) (let* ((message (make-string size ?a)) (process-connection-type nil) (proc (start-process "reader" nil "python2" "read.py"))) (dotimes (_ 100) (when cargo-cult (copy-sequence message)) (process-send-string proc message)) (kill-process proc))) And here's the experimental data: (benchmark-run 1 (bench t 65536)) ;; (0.048931082 0 0.0) (benchmark-run 1 (bench nil 65536)) ;; (1.844878298 0 0.0) (benchmark-run 1 (bench t 65537)) ;; (2.028768849 0 0.0) (benchmark-run 1 (bench nil 65537)) ;; (1.175113209 0 0.0) >From this we learn that: (1) Setting cargo-cult to *t* and running a useless `(copy-sequence message)' makes the code roughly 40 times *faster* on strings of length 65536. (2) Adding one to the string length yields horrible performance (3.2MB/s), but at least the version without the useless copies is faster. What's going on here? strace suggests that the second benchmark is spending 98% of its time in pselect6 calls. I confirmed that this isn't dependent on the version of Emacs in use (the same issue appears in Emacs 24.5). It is not dependent on the program on the receiving end of the pipe, either (I tried with cat, read.py as above, and a minimal C program (attached)). Replacing copy-sequence with concat preserves the surprising behavior, but using (ignore message) makes it disappear (everything is slow). There are no transmission issues: the receiving side does read a string of "a"s of the right length. Using a program other than Emacs to send the same strings doesn't show these performance issues. Here is a bit more data, generated using the attached loop.sh and send-string-cli.el ;; (benchmark-send-string CARGO-CULT N-INNER-LOOP STRING-SIZE READ-PROGRAM) → BENCHMARK-RESULT (benchmark-send-string t 200 1024 "python2" "read.py") → (0.067296848 1 0.004180474) (benchmark-send-string nil 200 1024 "python2" "read.py") → (0.061934555 0 0.0) (benchmark-send-string t 200 16384 "python2" "read.py") → (0.159620501 12 0.04479207899999999) (benchmark-send-string nil 200 16384 "python2" "read.py") → (1.556031994 0 0.0) (benchmark-send-string t 200 65536 "python2" "read.py") → (0.334425146 46 0.16486607800000005) (benchmark-send-string nil 200 65536 "python2" "read.py") → (3.4018179170000002 0 0.0) (benchmark-send-string t 200 65537 "python2" "read.py") → (12.418711961 46 0.239757055) (benchmark-send-string nil 200 65537 "python2" "read.py") → (6.102755695 0 0.0) (benchmark-send-string t 200 1024 "cat") → (0.10330761 1 0.004161472999999999) (benchmark-send-string nil 200 1024 "cat") → (0.12250944600000001 1 0.0035327989999999997) (benchmark-send-string t 200 16384 "cat") → (1.3808745009999999 24 0.09748934199999998) (benchmark-send-string nil 200 16384 "cat") → (2.521698033 12 0.05619970500000001) (benchmark-send-string t 200 65536 "cat") → (4.120669224 99 0.4042515129999999) (benchmark-send-string nil 200 65536 "cat") → (9.412745340999999 50 0.218435563) (benchmark-send-string t 200 65537 "cat") → (12.176100455 99 0.502071993) (benchmark-send-string nil 200 65537 "cat") → (6.322310322000001 51 0.22764341199999993) (benchmark-send-string t 200 1024 "read") → (0.026363418 1 0.004118800999999998) (benchmark-send-string nil 200 1024 "read") → (0.061907866 0 0.0) (benchmark-send-string t 200 16384 "read") → (0.091360628 12 0.038132967000000004) (benchmark-send-string nil 200 16384 "read") → (0.12431489 0 0.0) (benchmark-send-string t 200 65536 "read") → (0.18765710700000002 46 0.14072835099999995) (benchmark-send-string nil 200 65536 "read") → (3.397287521 0 0.0) (benchmark-send-string t 200 65537 "read") → (12.389248452999999 46 0.257472908) (benchmark-send-string nil 200 65537 "read") → (3.258524461 0 0.0) (benchmark-send-string t 1600 8192 "read") → (0.258897164 49 0.17045154200000004) (benchmark-send-string nil 1600 8192 "read") → (0.07095391 0 0.0) This shows one more thing: (3) Sending 1600 strings of 8192 characters is much faster than sending 200 strings of length 65536. I managed to reproduce these results on two different physical GNU/Linux machines. Their configurations are: GNU Emacs 26.0.50 (build 9, x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 2017-03-11 built on clem-w50-mint Repository revision: fe68818be218537d341365c014f1a6234c9705d6 Windowing system distributor 'The X.Org Foundation', version 11.0.11804000 System Description: Linux Mint 18.1 Serena In GNU Emacs 26.0.50.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 2017-02-08 built on goodsight Repository revision: eff901b8a39f42ddedf4c1db833b9071cae5962f System Description: Ubuntu 16.04.2 LTS Can anyone reproduce these issues, and help shed light on them? I'm trying to understand three things: * Why does running additional, supposedly useless code speed things up so much? (1) * Why does sending strings longer 2**16 characters slow things down so much? (2) * What is the right way to get reliable performance from process-send-string? (3) Thanks! Clément.
loop.sh
Description: application/shellscript
read.c
Description: Text Data
read.py
Description: Text Data
send-string-cli.el
Description: Text Data
[Prev in Thread] | Current Thread | [Next in Thread] |