bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash history with mixed timestamps slow and broken


From: Eric Blake
Subject: Re: bash history with mixed timestamps slow and broken
Date: Mon, 26 Sep 2016 10:23:00 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

On 09/25/2016 05:39 PM, Chet Ramey wrote:

>> Description:
>>      If the history file (`.bash_history`) starts with a timestamp
>>      (`HIST_TIMESTAMP_START`), and contains lines that have been written
>>      without timestamps, then reading the history file is (a) very slow
>>      because (b) consecutive lines without timestamps are merged into a
>>      single history entry with quadratic complexity.
>>

> 
> The appending behavior isn't really quadratic: the code simply keeps
> reallocating the line and appending to it.  You can imagine how long it
> takes to append a million commands to a single buffer.  You've managed
> to identify the most degenerate case.

That depends on how you do reallocation.

Let's try a simple example: lets say you are processing a list of 128
items.  If you allocate 1 additional slot on each iteration through the
loop, you are performing 128 allocations [O(n)], but you are ALSO
performing 128*127/2 moves [8128, O(n^2)] as you copy the previous
iteration's contents into the newly-allocated larger array.  THAT is
where the quadratic behavior comes from - using a constant-size
allocation pattern requires a quadratic-size copying.

But if instead switch things to do geometric allocation, you amortize
the costs.  On the first iteration, you allocate 1 row.  On the second
iteration, you allocate 2 additional rows (twice the allocation size as
last time), giving you the third iteration free.  On the fourth
iteration, you allocate 4 additional rows (again, double the allocation
size), giving you three more iterations free.  Overall, you perform only
8 allocations [O(log n)], and only 127 moves [O(n)].

As the array sizes get larger, the effects get more pronounced.
Appending a million commands to a buffer absolutely needs geometric
allocation size growth when you reallocate the array, rather than a
constant growth factor, or you quickly become bogged down in the
processing time required to copy the contents on each iteration that has
to reallocate.  ANY good allocation algorithm, where the size of the
list is not known a priori, will use a geometric allocation growth
mechanism (doubling the allocation size is easiest, but a 3/2 growth
ratio also works; the point is that each growth is larger than the last,
so that you get progressively more iterations where you don't have to
reallocate and copy).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]