monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Incremental binary test, try 2


From: Eric Anderson
Subject: [Monotone-devel] Incremental binary test, try 2
Date: Sun, 28 Aug 2005 16:17:32 -0700

All,
        Here is variant 2 of the incremental binary test.  Instead of
reading into a string, it reads into a char[] array.  This happens to
further reduce the amount of memory that is allocated, but doesn't
have any significant speedup or memory usage affect relative to the
previous patch.  It causes a 5-6x speedup relative to the current 0.22
release, and reduces memory usage by >10x in any case with large files
in it.
        -Eric

Changelog entry:

  2005-08-21  Eric Anderson  <address@hidden>
        * file_io.cc, file_io.hh, lua.cc, std_hooks.lua: determine if a
        file is binary by looking at it incrementally, rather than reading
        it in entirely.  Prepare for making it possible to control what
        characters are considered "binary"

It passes all regression tests.  Performance summary is included
below.

                                     Maximum (MiB)      Copied    Malloc
     *Test*      Operation  CPU(s)   Size  Resident      (MiB)     (MiB)
---------------- ---------  ------  -------  -------   --------  --------
zero_small       add files     0.0     9.38     3.93          0         1
zero_large       add files     0.0     9.38     3.93          0         1
random_medium    add files     0.0     9.38     3.93          0         1
random_medium_20 add files     0.0     9.38     3.94          0         1
halfzero_large   add files     0.0     9.38     3.93          0         1
random_large     add files     0.0     9.50     3.93          0         1
monotone         add files     0.2     9.89     4.42          1         1
mt_multiple      add files     3.2    14.13     7.86         21         1
mt_bigfiles      add files     0.2     9.51     3.93          0         1
mixed_1          add files     0.2     9.89     4.42          1         1
mixed_4          add files     1.0    10.98     5.52          7         1
mixed_12         add files     2.9    13.30     7.65         18         1
everything       add files     7.3    18.16    11.72         40         1

---- Details from previous message ----

Summary: The attached patch changes the function which determines if a file
  is binary from operating on a string to operating on a filename.  This 
  avoids lua reading the entire file into memory if we can determine that
  it is binary in the first few characters.  The patch also creates a 
  function that sets the "non-binary" characters rather than having it
  be completely hardcoded. A speedup of ~5-6x resulted on adds, with a
  larger speedup in extreme cases of a single large binary file.
  Memory usage was reduced by >10x in any case with large files in it, and
  was slightly reduced for the case with lots of small text files.
  No effect on other operations in either memory or CPU usage.

Detailed discussion:

The current method for determining if a file is binary involves
reading in the entire file into memory in lua, and then calling a C++
function to determine if that string is binary.  Lua is stunningly
inefficient at reding in a large file (reading in 100MiB copies
1.4GiB).  

Instead of reading the file in Lua, we instead pass the filename to
the C++ function, read the file in chunks and if any chunk is binary
we stop immediately.

I also changed the guess_binary function to use a boolean array of
"binary characters" rather than a fixed string.  The array is
initialized by calling a set_char_is_binary() function.  If this seems
like a good change to people, then instead of calling that function
from C++, we can call a lua hook which can setup the list of binary
characters.

Attachment: incremental-binary-test.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]