[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Reading portions of large files
From: |
Lee Sau Dan |
Subject: |
Re: Reading portions of large files |
Date: |
20 Jan 2003 08:50:30 +0100 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 |
>>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>> Assuming all editing is within the first 2000 bytes (not
>> tested):
>>
>> head -c2000 bigfile > header-to-be-edited
>> tail -c+2001 bigfile > the-rest
>> (edit header-to-be-edited, save)
>> cat header-to-be-edited the-rest > new-big-file
Benjamin> This assumes a) Unix, b) that you have the space and
Benjamin> time ;-) to deal with the large temporary files.
(b) is assumed even if you use other method. Most *text* editors
would save files by first writing a temp. copy of the new version,
followed by renaming the new version to the old name. So, in case of
a crash, you don't lose everything. Either the old version or the new
version should survive intact.
So, if you didn't have the extra disk space, you can't do the editing
either.
Time? It doesn't take much time to 'split' and 'cat'. Moreover,
running the editor on smaller pieces do save time on loading and
saving the file fragments. Moreover, the editor doesn't need that
much RAM when editing the file.
Benjamin> If you can assume Unix, dd is a little better, I think.
Why not 'split'?
Benjamin> I recently had success with using it for extracting and
Benjamin> later re-inserting a bit in a large file.
Only when the extracted and re-inserted blocks are of the same size.
This is the case for hex editing, but not *text* editing. If you're
doing hex editing, you shouldn't be using a text editor in the first
place. There are hex editors which doesn't need to load the whole
file into memory.
Benjamin> Getting the options right is a bit of a pain,
No. That is true only when you're using 'dd' for the first time.
After a few times, it's easy to remember what options to use. Most of
the time, I only need "if=", "of=", "bs=", "skip=", "seek=" and
"count=". These option names are quite easy to remember once you know
the basic principle that 'dd' works by transferring blocks of the
input file to output file.
Benjamin> but the main thing was getting the direction (extract
Benjamin> and re-insert) right and using conv=notrunc for
Benjamin> re-insertion. And than dd is oriented towards blocks of
Benjamin> bytes, not lines, of course.
This is the down side. For line-oriented operations, use 'head',
'tail', 'cat', 'sed', or even 'awk' and 'perl'.
Benjamin> And you can not change the size of the block to be
Benjamin> edited, but than large files are usually binary files,
Benjamin> where you don't want to change byte offsets anyway.
Then, find a hex editor. *Text* editors are simply not the right tool
to edit huge *binary* files. In theory, hex editors can be
implemented very efficiently using mmap().
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
Re: Reading portions of large files, Eric Pement, 2003/01/10
Re: Reading portions of large files, Brendan Halpin, 2003/01/10
Re: Reading portions of large files, Lee Sau Dan, 2003/01/20