[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Playing with guile (vs python). Generate file for GDP suitable for g
From: |
Arne Babenhauserheide |
Subject: |
Re: Playing with guile (vs python). Generate file for GDP suitable for gnuplot. |
Date: |
Tue, 31 Jan 2017 10:41:06 +0100 |
Hi Germán,
If I understand your script correctly, you want to grab all lines with
GDP, sort the values by year and country and output them. Is that right?
As a first warning: the csv module in Python mainly calls into a C-based
implementation (_csv, see csv.__file__), so it will be hard to beat this
in pure Scheme.
But now, let’s begin with the optimization. These are my times:
$ time guile-2.0 extract_gdp.scm
real 0m0.509s
$ time python3 extract_gdp.py
real 0m0.089s
The first step is using Guile 2.1.6 instead of 2.0. That reduces the
runtime by 40% to 0.3s. Source: ftp://alpha.gnu.org/gnu/guile/guile-2.1.6.tar.xz
$ time guile extract_gdp.scm
real 0m0.296s
$ time python3 extract_gdp.py
real 0m0.089s
So there’s a factor of 3.3 between Python and Guile on my machine.
Aside from using a more recent Guile, I do not see obvious
optimizations, however (more exactly: all my tries to speedup the code
only made it slower). Though there might be optimizations I do not
see, because 80% of the remaining time is spent in string-parsing.
One thing where I don’t see how to make it cheaper in pure Scheme is
string->number. That calls directly into libguile/numbers.c which does
much more than what python's int() does (internally it calls
mem2complex). But using a pure-scheme function which does less only
makes it slower:
(define (string->integer s)
(define (b10fold x kept)
(+ (* 10 kept)
(- (char->integer x) 48)))
(string-fold b10fold 0 s))
As I said: the above makes the code run slower, not faster. A native C
function for string->integer (which only handles integers) could provide
a speedup for that, but I don’t know whether you want to go that far. See
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/numbers.c;hb=475772ea57c97d0fa0f9ed9303db137d9798ddd3#l6439
However every time I thought I had a program optimized as far as
possible, talking with Andy Wingo made it much faster, so there might be
lots I’m missing.
Given that just converting a bytevector read from the file to integers
takes 0.8s, I do not think just using bytevectors will help:
(bytevector->u8-list bv) ; takes 0.8s for your file
Maybe there are more efficient ways to do this, though.
Best wishes,
Arne
Germán Diago writes:
> Hello everyone,
>
> I did a script that parses some file with the GDP since 1970 for many
> countries. I filter the file and discard uninteresting fields, later I
> write in a format suitable for gnuplot.
>
> I did this in python and guile.
>
> In python it takes around 1.1 seconds in my raspberry pi.
>
> In Guile it is taking around 11 seconds.
>
> I do not claim they are doing exactly the same: in python I use arrays and
> dictionaries, in guile I am using mainly lists, I would like to know if you
> could give me advice on how to optimize it. I am just training for now.
>
> The scripts in both python and guile are attached and the profile data for
> scheme is below. Just place in the same directory the .csv file and it
> should generate an output file with the data ready for gnuplot :)
>
> % cumulative self
> time seconds seconds name
> 26.24 3.45 3.43 %read-line
> 20.51 2.68 2.68 string->number
> 15.54 2.05 2.03 string-delete
> 7.39 7.75 0.97 map
> 5.13 3.96 0.67 transform-data
> 4.07 1.75 0.53 format:format-work
> 3.17 0.41 0.41 string=?
> 2.87 0.37 0.37 string-ref
> 1.81 2.50 0.24 tilde-dispatch
> 1.81 0.24 0.24 number->string
> 1.51 0.34 0.20 is-a-digit
> 1.06 0.28 0.14 anychar-dispatch
> 1.06 0.14 0.14 display
> 1.06 0.14 0.14 string-length
> 1.06 0.14 0.14 char>=?
> 1.06 0.14 0.14 char<=?
> 1.06 0.14 0.14 string-split
> 0.60 0.08 0.08 length
> 0.45 0.49 0.06 format:out-num-padded
> 0.45 0.06 0.06 remove-dots
> 0.30 0.04 0.04 %after-gc-thunk
> 0.30 0.04 0.04 list-tail
> 0.30 0.04 0.04 write-char
> 0.15 3.53 0.02 loop
> 0.15 3.47 0.02 read-line
> 0.15 0.02 0.02 substring
> 0.15 0.02 0.02 list-ref
> 0.15 0.02 0.02 reverse!
> 0.15 0.02 0.02 #<procedure 2360350 at extract_gdp.scm:58:10
> (e)>
> 0.15 0.02 0.02 integer?
> 0.15 0.02 0.02 char=?
> 0.00 13.07 0.00 load-compiled/vm
> 0.00 13.07 0.00 #<procedure 18c6180 at ice-9/top-repl.scm:31:6
> (thunk)>
> 0.00 13.07 0.00 #<procedure 1a92e00 at ice-9/boot-9.scm:4045:3
> ()>
> 0.00 13.07 0.00 call-with-prompt
> 0.00 13.07 0.00 #<procedure 18c6100 at ice-9/top-repl.scm:66:5
> ()>
> 0.00 13.07 0.00 apply-smob/1
> 0.00 13.07 0.00 catch
> 0.00 13.07 0.00 #<procedure 1a919c0 at statprof.scm:655:4 ()>
> 0.00 13.07 0.00 run-repl*
> 0.00 13.07 0.00 save-module-excursion
> 0.00 13.07 0.00 statprof
> 0.00 13.07 0.00 start-repl*
> 0.00 11.22 0.00 #<procedure 1a8a170 ()>
> 0.00 3.53 0.00 call-with-input-file
> 0.00 1.85 0.00 call-with-output-file
> 0.00 1.79 0.00 for-each
> 0.00 1.75 0.00 format
> 0.00 0.14 0.00 get-fields
> 0.00 0.10 0.00 #<procedure 2d398a0 at extract_gdp.scm:48:18
> (year)>
> 0.00 0.06 0.00 #<procedure 2d021c8 at extract_gdp.scm:46:6 (p)>
> 0.00 0.02 0.00 format:out-obj-padded
> 0.00 0.02 0.00 remove
> 0.00 0.02 0.00 call-with-output-string