Re: Bash high memory usage in simple for loop?

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bash high memory usage in simple for loop?

From:	Greg Wooledge
Subject:	Re: Bash high memory usage in simple for loop?
Date:	Mon, 27 Sep 2010 08:44:08 -0400
User-agent:	Mutt/1.4.2.3i

On Sun, Sep 26, 2010 at 11:20:53PM -0400, Thomas Guyot-Sionnest wrote:
> I just encountered some dubious behaviour... Feeding a large data set in
> a "for i in..." loop doing nothing causes exponentially large memory
> usage, and that memory is never freed (although is seems to be re-used
> over multiple runs...)

Once memory has been allocated to a process, it's generally never
returned to the OS until the process exits.  The malloc() functions
in libc (or bash's builtin malloc if your build of bash uses that)
should be responsible for recyling the memory within the process.

> For instance I have 68M of small numbers (1..256) separated by newlines
> and that makes bash grow over 1.6G, even when ann it does inside the
> loop is calling true. The only way I can free up the memory is to leave
> the shell.

You "have" them... where?  In a file?

> You can test easily with this command (you might want to limit your
> memory with ulimit first to avoid trashing your system...):
> 
> $ for i in `seq 1 10000000`; do true; done
> 
> On my test system this requires 1G of memory, and memory climbs a bit
> higher on additional runs but settles at 1.1G (it doesn't seem to leak
> any memory part this point.

"Leak" is the wrong word here.  When you write code like this, bash runs
the seq command in a child process connected to a pipe, and then reads
the entire output into memory.  Then, since the command substitution
wasn't quoted, it has to apply word splitting to that output.  I'm not
sure exactly how that works internally; it might require an additional
allocation of memory to hold the words as individual strings.

You're spitting out 10 million numbers of 1 to 8 digits apiece.  Let's
suppose this averages out to 7 digits per number.  That plus a trailing
newline is 8 bytes per number, so that's 80 million bytes, not counting
any additional overhead (an array of pointers, etc.).

> Is this normal or expected?

Somewhat expected; the exact amount of memory used is slightly surprising
since it's about 1 order of magnitude more than I would expect based on
the calculation I did above.

If you want to count to 10 million in a loop, it's FAR more efficient to
do it this way:

  for ((i=1; i<=10000000; i++))

That doesn't store the entire sequence of numbers in memory; it just
increments a single counter.

Likewise, if you need to loop through the numbers which are stored in a
file, one per line, you should write it this way:

  while read number; do
    ...
  done < myfile

Instead of this way:

  # THIS IS BAD
  for number in $(cat myfile); do ...

The latter reads the entire file into memory and does word splitting, as
I described above.  For more details, see
http://mywiki.wooledge.org/BashFAQ/001

[Prev in Thread]

Current Thread

[Next in Thread]

Bash high memory usage in simple for loop?, Thomas Guyot-Sionnest, 2010/09/26
- Re: Bash high memory usage in simple for loop?, Greg Wooledge <=
- Re: Bash high memory usage in simple for loop?, Chet Ramey, 2010/09/27

Prev by Date: Bash high memory usage in simple for loop?
Next by Date: Re: asking for a better way to implement this
Previous by thread: Bash high memory usage in simple for loop?
Next by thread: Re: Bash high memory usage in simple for loop?
Index(es):
- Date
- Thread