monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Patch to speed up add operation


From: Eric Anderson
Subject: [Monotone-devel] Patch to speed up add operation
Date: Wed, 24 Aug 2005 11:49:47 -0700

Summary: The attached patch changes the function which determines if a file
  is binary from operating on a string to operating on a filename.  This 
  avoids lua reading the entire file into memory if we can determine that
  it is binary in the first few characters.  The patch also creates a 
  function that sets the "non-binary" characters rather than having it
  be completely hardcoded. A speedup of ~5-6x resulted on adds, with a
  larger speedup in extreme cases of a single large binary file.
  Memory usage was reduced by >10x in any case with large files in it, and
  was slightly reduced for the case with lots of small text files.
  No effect on other operations in either memory or CPU usage.

Changelog entry:

  2005-08-21  Eric Anderson  <address@hidden>
        * file_io.cc, file_io.hh, lua.cc, std_hooks.lua: determine if a
        file is binary by looking at it incrementally, rather than reading
        it in entirely.  Prepare for making it possible to control what
        characters are considered "binary"

Detailed discussion:

The current method for determining if a file is binary involves
reading in the entire file into memory in lua, and then calling a C++
function to determine if that string is binary.  Lua is stunningly
inefficient at reding in a large file (reading in 100MiB copies
1.4GiB).  

Instead of reading the file in Lua, we instead pass the filename to
the C++ function, read the file in chunks and if any chunk is binary
we stop immediately.

I also changed the guess_binary function to use a boolean array of
"binary characters" rather than a fixed string.  The array is
initialized by calling a set_char_is_binary() function.  If this seems
like a good change to people, then instead of calling that function
from C++, we can call a lua hook which can setup the list of binary
characters.

Since guess_binary() took a string, I used the slightly questionable: 
&string[0] trick to get a writeable pointer to the string in
file.read(&buf[0],bufsize) in monotone_guess_binary_filename_for_lua.
Does anyone know of a way to just read up to some number of bytes into
a string?  If not, should guess_binaries signature be changed to 
guess_binary(unsigned char *data, int datalen) and a char[bufsize] array
be used instead of a string?

Performance analysis:

Test CPU: Intel(R) Pentium(R) M processor 1700MHz
monotone 0.22 (base revision: 072f38da9450e2e2e406332a480c8c7a50736f8b)
                                     Maximum (MiB)      Copied    Malloc
     *Test*      Operation  CPU(s)   Size  Resident      (MiB)     (MiB)
---------------- ---------  ------  -------  -------   --------  --------
zero_small       add files     0.0     7.30     2.72          0         1
zero_large       add files     3.9   420.43   413.42       1448       101
random_medium    add files     0.3    51.68    46.88        126        11
random_medium_20 add files     5.0    69.20    64.43       2521       201
halfzero_large   add files     3.8   414.82   408.55       1739       101
random_large     add files     3.8   414.82   407.60       1963       101
monotone         add files     0.4     9.24     4.42         27        11
mt_multiple      add files     5.1    12.30     7.32        239       104
mt_bigfiles      add files     3.7    73.77    69.03       1318       107
mixed_1          add files     1.9   125.50   118.29        676        61
mixed_4          add files     6.9   126.02   120.17       2417       210
mixed_12         add files    19.6   128.62   121.93       7012       587
everything       add files    45.7   502.46   497.21      16111      1297

                                     Maximum (MiB)      Copied    Malloc
     *Test*      Operation  CPU(s)   Size  Resident      (MiB)     (MiB)
---------------- ---------  ------  -------  -------   --------  --------
zero_small       add files     0.0     7.48     2.77          0         1
zero_large       add files     0.0     7.33     2.77          0         1
random_medium    add files     0.0     7.33     2.77          0         1
random_medium_20 add files     0.0     7.33     2.78          0         1
halfzero_large   add files     0.0     7.33     2.77          0         1
random_large     add files     0.0     7.33     2.77          0         1
monotone         add files     0.2     7.87     3.25          1        13
mt_multiple      add files     3.4    11.18     6.68         22       126
mt_bigfiles      add files     0.3     7.48     2.78          0         1
mixed_1          add files     0.2     7.87     3.25          1        13
mixed_4          add files     1.0     8.99     4.44          7        48
mixed_12         add files     3.0    11.02     6.43         18       119
everything       add files     7.2    16.04    10.49         41       245

Attachment: incremental-binary-test.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]