Re: Reading portions of large files

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Reading portions of large files

From:	Lee Sau Dan
Subject:	Re: Reading portions of large files
Date:	20 Jan 2003 08:50:30 +0100
User-agent:	Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7

>>>>> "Benjamin" == Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

    >> Assuming all editing is within the first 2000 bytes (not
    >> tested):
    >> 
    >> head -c2000 bigfile > header-to-be-edited 
    >> tail -c+2001 bigfile > the-rest
    >>   (edit header-to-be-edited, save)
    >> cat header-to-be-edited the-rest > new-big-file

    Benjamin> This assumes a) Unix, b) that you have the space and
    Benjamin> time ;-) to deal with the large temporary files.

(b)  is assumed even  if you  use other  method.  Most  *text* editors
would save  files by first writing  a temp.  copy of  the new version,
followed by renaming the new version  to the old name.  So, in case of
a crash, you don't lose everything.  Either the old version or the new
version should survive intact.

So, if you didn't have the  extra disk space, you can't do the editing
either.

Time?   It doesn't  take much  time to  'split' and  'cat'.  Moreover,
running  the editor  on smaller  pieces do  save time  on  loading and
saving  the file fragments.   Moreover, the  editor doesn't  need that
much RAM when editing the file.

    Benjamin> If you can assume Unix, dd is a little better, I think.

Why not 'split'?

    Benjamin> I recently had success with using it for extracting and
    Benjamin> later re-inserting a bit in a large file.  

Only when the  extracted and re-inserted blocks are  of the same size.
This is the  case for hex editing, but not  *text* editing.  If you're
doing hex editing,  you shouldn't be using a text  editor in the first
place.  There  are hex  editors which doesn't  need to load  the whole
file into memory.

    Benjamin> Getting the options right is a bit of a pain, 

No.  That  is true  only when  you're using 'dd'  for the  first time.
After a few times, it's easy to remember what options to use.  Most of
the  time, I  only  need  "if=", "of=",  "bs=",  "skip=", "seek="  and
"count=".  These option names are quite easy to remember once you know
the  basic principle  that 'dd'  works by  transferring blocks  of the
input file to output file.

    Benjamin> but the main thing was getting the direction (extract
    Benjamin> and re-insert) right and using conv=notrunc for
    Benjamin> re-insertion.  And than dd is oriented towards blocks of
    Benjamin> bytes, not lines, of course.  

This  is the  down side.   For line-oriented  operations,  use 'head',
'tail', 'cat', 'sed', or even 'awk' and 'perl'.

    Benjamin> And you can not change the size of the block to be
    Benjamin> edited, but than large files are usually binary files,
    Benjamin> where you don't want to change byte offsets anyway.

Then, find a hex editor.  *Text* editors are simply not the right tool
to  edit  huge  *binary*  files.    In  theory,  hex  editors  can  be
implemented very efficiently using mmap().

-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Reading portions of large files, (continued)
- Re: Reading portions of large files, Eric Pement, 2003/01/10
- Re: Reading portions of large files, Brendan Halpin, 2003/01/10
  - Re: Reading portions of large files, Benjamin Riefenstahl, 2003/01/10
    - Re: Reading portions of large files, Klaus Berndl, 2003/01/11
    - Re: Reading portions of large files, Lee Sau Dan <=
    - Re: Reading portions of large files, Benjamin Riefenstahl, 2003/01/20
  - Re: Reading portions of large files, Lee Sau Dan, 2003/01/20

Prev by Date: Re: gdb and emacs
Next by Date: Re: interpreting ^Hs in text files
Previous by thread: Re: Reading portions of large files
Next by thread: Re: Reading portions of large files
Index(es):
- Date
- Thread