[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Uploading Word documents, PDFs, PNG files etc

From: Keith Wright
Subject: Re: Uploading Word documents, PDFs, PNG files etc
Date: Wed, 13 May 2009 23:47:06 -0400

> From: Linas Vepstas <address@hidden>
> Cc: address@hidden
> 2009/5/13 Sebastian Tennant <address@hidden>:
> > Restricting regexps to actual text is fine... until
> > you need to grep binary data, or, as in this case,
> > a combination of text and binary data.
> > in cgi.scm that extracted the uploaded (possibly
> > binary) file, because the pattern identifying the
> > beginning of the file in the raw data string is
> > simple ("\n\r\n\r") -
> No, this sounds somehow broken.  If I remember correctly,
> binary mime-parts should have a ConentLength header
> so you can skip over them. If ContentLength is absent,
> then the part should bee ascii-encoded (e.g. base64)
> yeah, grapping large blocks of ascii sucks, which is
> why the ContetnLength should be used.
> -- linas

If the spec says a length indication followed by
a fixed length of arbitrary binary data, then it
is not just sucky, but incorrect to apply either
grep or regexp to the binary.  It will seem to
work until it hits a binary data that "by
accident" contains the string you are looking

The only correct algorithm is to make a preliminary
pass to somehow remove the binary data and
pseudo-concatenate the remaining strings.

  -- Keith

reply via email to

[Prev in Thread] Current Thread [Next in Thread]