[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Incremental binary test, try 2
From: |
Eric Anderson |
Subject: |
[Monotone-devel] Incremental binary test, try 2 |
Date: |
Sun, 28 Aug 2005 16:17:32 -0700 |
All,
Here is variant 2 of the incremental binary test. Instead of
reading into a string, it reads into a char[] array. This happens to
further reduce the amount of memory that is allocated, but doesn't
have any significant speedup or memory usage affect relative to the
previous patch. It causes a 5-6x speedup relative to the current 0.22
release, and reduces memory usage by >10x in any case with large files
in it.
-Eric
Changelog entry:
2005-08-21 Eric Anderson <address@hidden>
* file_io.cc, file_io.hh, lua.cc, std_hooks.lua: determine if a
file is binary by looking at it incrementally, rather than reading
it in entirely. Prepare for making it possible to control what
characters are considered "binary"
It passes all regression tests. Performance summary is included
below.
Maximum (MiB) Copied Malloc
*Test* Operation CPU(s) Size Resident (MiB) (MiB)
---------------- --------- ------ ------- ------- -------- --------
zero_small add files 0.0 9.38 3.93 0 1
zero_large add files 0.0 9.38 3.93 0 1
random_medium add files 0.0 9.38 3.93 0 1
random_medium_20 add files 0.0 9.38 3.94 0 1
halfzero_large add files 0.0 9.38 3.93 0 1
random_large add files 0.0 9.50 3.93 0 1
monotone add files 0.2 9.89 4.42 1 1
mt_multiple add files 3.2 14.13 7.86 21 1
mt_bigfiles add files 0.2 9.51 3.93 0 1
mixed_1 add files 0.2 9.89 4.42 1 1
mixed_4 add files 1.0 10.98 5.52 7 1
mixed_12 add files 2.9 13.30 7.65 18 1
everything add files 7.3 18.16 11.72 40 1
---- Details from previous message ----
Summary: The attached patch changes the function which determines if a file
is binary from operating on a string to operating on a filename. This
avoids lua reading the entire file into memory if we can determine that
it is binary in the first few characters. The patch also creates a
function that sets the "non-binary" characters rather than having it
be completely hardcoded. A speedup of ~5-6x resulted on adds, with a
larger speedup in extreme cases of a single large binary file.
Memory usage was reduced by >10x in any case with large files in it, and
was slightly reduced for the case with lots of small text files.
No effect on other operations in either memory or CPU usage.
Detailed discussion:
The current method for determining if a file is binary involves
reading in the entire file into memory in lua, and then calling a C++
function to determine if that string is binary. Lua is stunningly
inefficient at reding in a large file (reading in 100MiB copies
1.4GiB).
Instead of reading the file in Lua, we instead pass the filename to
the C++ function, read the file in chunks and if any chunk is binary
we stop immediately.
I also changed the guess_binary function to use a boolean array of
"binary characters" rather than a fixed string. The array is
initialized by calling a set_char_is_binary() function. If this seems
like a good change to people, then instead of calling that function
from C++, we can call a lua hook which can setup the list of binary
characters.
incremental-binary-test.patch
Description: Binary data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Monotone-devel] Incremental binary test, try 2,
Eric Anderson <=