libtool-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] maint: ship .xz, not .lzma


From: Charles Wilson
Subject: Re: [PATCH] maint: ship .xz, not .lzma
Date: Tue, 14 Sep 2010 03:17:46 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666

On 9/14/2010 2:04 AM, Gary V. Vaughan wrote:
> I'm curious to know what the history of lzma and xz is that makes this
> desirable though.

Here's some documentation I put together for the cygwin xz package:

xz
========================================================================
This package provides a data compression library and utilities
supporting the .xz and .lzma file formats, which use the LZMA
compression algorithm.  LZMA provides high compression ratios and very
fast decompression, with minimal memory requirements for decompression.
XZ Utils is the latest generation of this software, supplanting the
older LZMA Utils.

The cygwin xz package replaces and obsoletes the cygwin lzma package.

LZMA Utils (and its own antecedent, the LZMA SDK) provided the 'lzma'
tool, which supported the 'LZMA-Alone' file format usually indicated by
the extension '.lzma'.  Internally, this file format used what is now
called the LZMA1 compression algorithm.

XZ Utils provides the xz tool, which supports the new .xz file format
usually indicated by the extension '.xz'. Internally, it uses a
variation of the original LZMA compression algorithm, called LZMA2.
However, the new xz tool also seamlessly supports the older .lzma files
and LZMA1 compression.

History:
========================================================================

1. LZMA SDK
First there was the LZMA SDK. Upstream, it shipped no libraries; only a
few executables such as 'lzma'. The source code was provided for public
use (under a variety of licenses), but it was expected that developers
would incorporate the source code directly into their own projects.
This is not The Unix Way.

The LZMA SDK was tightly coupled with the 7zip compression program, and
both were developed on and solely for the Windows platform.  7zip -- but
not the LZMA SDK -- was ported to Unix under the auspices of the p7zip
("Portable 7zip") project. (As an aside, p7zip was then ported to
cygwin...to come full circle). However, it should be clear that the file
format used by 7zip (and p7zip) was completely different from the one
supported by the LZMA SDK's 'lzma' tool.  The latter used what was
called the 'LZMA-Alone' format, which consisted of 13 bytes of header
information followed by a raw lzma-compressed byte-stream.  7zip, on the
other hand, used a much more complicated file format capable of hosting
multiple files, spanned archives, and other features. The only
similarity is that the core data compression algorithm used by both is
LZMA.

2. LZMA Utils
Eventually, a unix port of the LZMA SDK appeared, in the form of the
LZMA Utils distribution, which reorganized the original source code, and
provided the decompression code in library form (liblzmadec). It also
provided a version of the 'lzma' program, but with a completely
different command-line interface. The LZMA Utils version consciously
mimicked the command-line options of the familiar gzip and bzip2 tools,
while the original LZMA SDK version was...different. Very different.
This is because the LZMA SDK's tool was originally intended just as a
test and development utility, to help refine the algorithm. So, it has
a number of 'compression guru' options that no sane user cares to use,
and very few of the 'normal user' options that they would.

   LZMA Utils: (Lasse Collin)
      lzma -d foo.tar.lzma
         uncompress to (implied) foo.tar, and remove
         original compressed file.
      lzma foo.tar
         compress to (implied) foo.tar.lzma, and remove
         original uncompressed file.
      Supports familiar "tuning" options like -0 .. -9
      Sends output data to stdout using -c
      Could be invoked under alternate names (symlinks)
      for different behavior:
          unlzma == lzma -d  (uncompress)
          lzcat  == lzma -dc (uncompress to stdout)

   LZMA SDK: (Igor Pavlov)
      lzma d foo.tar.lzma foo.tar
      lzma e foo.tar      foo.tar.lzma
         mode d/e is the required first non-option argument
         both input and output files must be specified
      stdout? what's that?

Finally, LZMA Utils also shipped a number of helpful scripts similar to
the familiar ones from gzip and bzip2:
  lzdiff/lzcmp, lzgrep/lzegrep/lzfgrep, lzless/lzmore

So, the LZMA SDK version was hardly suitable for replacing or augmenting
the existing bzip2 and gzip compression programs on unix systems,
expecially as the most common use was in conjuction with tar.  But tar
expects compression programs to satisfy a common command-line argument
format, and to be able to manipulate data on standard streams. Most
linux distributions have standardized on LZMA Utils.

The lzma tool from both LZMA SDK and LZMA Utils each support the
LZMA-Alone (.lzma) file format, as does the liblzmadec library from
LZMA Utils.

However, the .lzma file format (e.g. LZMA-Alone) is not sufficient for
modern needs, as it (1) had no 'signature bytes' so compressed files
were difficult to automatically detect and verify, (2) it had no
provision for internal integrity checks, and (3) it could not support
multi-file archives.

3. XZ Utils
Approaching final non-beta release is the newest member of this family,
the XZ Utils. Addressing the shortcomings of the LZMA-Alone file format,
the xz file format and the (slightly modified) LZMA2 compression
algorithm were jointly developed by Lasse Collin (LZMA Utils) and Igor
Pavlov (LZMA SDK). The xz tool has all of the benefits of the LZMA
Utils' version of the lzma tool, and ships with all of the same helpful
scripts. In addition, it can be invoked as either 'xz' (or xzcat, unxz)
or 'lzma' (or lzcat, unlzma) so you don't even need to retrain your
fingers.

You probably should, though, because .lzma files are already being
replaced by .xz files on by many software distribution sites, including
GNU ones.

Finally, the XZ Utils also provides the liblzma decompression AND
compression library, which supports both LZMA-Alone (that is, the old
.lzma) format, and the new .xz format.

The new .xz file format has an easily identifiable initial signature for
automated format detection and verification. It supports integrity
checks of several types including cryptographic hashes. Finally, the
format also supports multiple compressed streams within the same file
(that is, multi-file archives).  However, the xz tool does NOT, at
present, support multi-file archives; only archives with a single
compressed stream.

As an aside, eventually the 7zip (and pz7ip) utilities will support a
"new" .7z format -- which will be simply a compatible variant of the .xz
file format, but with custom filters (codecs) specified in the (highly
extensible) header defined by the .xz standard. This was the primary
reason for the new .xz format to support multi-file archives; because
the xz tool has no present need for them, and doesn't even support them
(although the liblzma library does).

Single File Compression
========================================================================
Although the xz file format supports multiple streams, the xz tool
itself is concerned only with single files that have been compressed as
a single complete stream using LZMA compression. This is similar to the
behavior of the older lzma tool and its LZMA-Alone file format, or the
archetypal gzip and bzip2 compression programs.

Just as with bzip2 and gzip (and the old lzma tool), to create multi-
file archives you should use tar and THEN compress with xz.exe.  For an
integrated compressed archive file format that uses LZMA compression,
see p7zip and the 7zip programs, and their associated .7z file format.

--
Chuck



reply via email to

[Prev in Thread] Current Thread [Next in Thread]