5 Copyright (C) 2007 Lasse Collin
7 Copying and distribution of this file, with or without modification,
8 are permitted in any medium without royalty provided the copyright
9 notice and this notice are preserved.
12 Q: What are LZMA, LZMA Utils, lzma, .lzma, liblzma, LZMA SDK, LZMA_Alone,
15 A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. LZMA is the name
16 of the compression algorithm designed by Igor Pavlov. He is the author
17 of 7-Zip, which is a great LGPL'd compression tool for Microsoft
18 Windows operating systems. In addition to 7-Zip itself, also LZMA SDK
19 is available on the website of 7-Zip. LZMA SDK contains LZMA
20 implementations in C++, Java and C#. The C++ version is the original
21 implementation which is used also in 7-Zip itself.
23 Excluding the unrar plugin, 7-Zip is free software (free as in
24 freedom). Thanks to this, it was possible to port it to POSIX
25 platforms. The port was done and is maintained by myspace (TODO:
26 myspace's real name?). p7zip is a port of 7-Zip's command line version;
27 p7zip doesn't include the 7-Zip's GUI.
29 In POSIX world, users are used to gzip and bzip2 command line tools.
30 Developers know APIs of zlib and libbzip2. LZMA Utils try to ease
31 adoption of LZMA on free operating systems by providing a compression
32 library and a set of command line tools. The library is called liblzma.
33 It provides a zlib-like API making it easy to adapt LZMA compression in
34 existing applications. The main command line tool is known as lzma,
35 whose command line syntax is very similar to that of gzip and bzip2.
37 The original command line tool from LZMA SDK (lzma.exe) was found from
38 a directory called LZMA_Alone in the LZMA SDK. It used a simple header
39 format in .lzma files. This format was also used by LZMA Utils up to
40 and including 4.32.x. In LZMA Utils documentation, LZMA_Alone refers
41 to both the file format and the command line tool from LZMA SDK.
43 Because of various limitations of the LZMA_Alone file format, a new
44 file format was developed. Extending some existing format such as .gz
45 used by gzip was considered, but these formats were found to be too
46 limited. The filename suffix for the new .lzma format is `.lzma'. The
47 same suffix is also used for files in the LZMA_Alone format. To make
48 the transition to the new format as transparent as possible, LZMA Utils
49 support both the new and old formats transparently.
51 7-Zip and LZMA SDK: <http://7-zip.org/>
52 p7zip: <http://p7zip.sourceforge.net/>
53 LZMA Utils: <http://tukaani.org/lzma/>
56 Q: What LZMA implementations there are available?
58 A: LZMA SDK contains implementations in C++, Java and C#. The C++ version
59 is the original implementation which is part of 7-Zip. LZMA SDK
60 contains also a small LZMA decoder in C.
62 A port of LZMA SDK to Pascal was made by Alan Birtles
63 <http://www.birtles.org.uk/programming/>. It should work with
64 multiple Pascal programming language implementations.
66 LZMA Utils includes liblzma, which is directly based on LZMA SDK.
67 liblzma is written in C (C99, not C89). In contrast to C++ callback
68 API used by LZMA SDK, liblzma uses zlib-like stateful C API. I do not
69 want to comment whether both/former/latter/neither API(s) are good or
70 bad. The only reason to implement a zlib-like API was, that many
71 developers are already familiar with zlib, and very many applications
72 already use zlib. Having a similar API makes it easier to include LZMA
73 support in existing applications.
75 See also <http://en.wikipedia.org/wiki/LZMA#External_links>.
78 Q: Which file formats are supported by LZMA Utils?
80 A: Even when the raw LZMA stream is always the same, it can be wrapped
81 in different container formats. The preferred format is the new .lzma
82 format. It has magic bytes (the first six bytes: 0xFF 'L' 'Z' 'M'
83 'A' 0x00). The format supports chaining up to seven filters filters,
84 splitting data to multiple blocks for easier multi-threading and rough
85 random-access reading. The file integrity is verified using CRC32,
86 CRC64, or SHA256, and by verifying the uncompressed size of the file.
88 LZMA SDK includes a tool called LZMA_Alone. It supports uses a
89 primitive header which includes only the mandatory stream information
90 required by the LZMA decoder. This format can be both read and
91 written by liblzma and the command line tool (use --format=alone to
94 .7z is the native archive format used by 7-Zip. This format is not
95 supported by liblzma, and probably will never be supported. You
96 should use e.g. p7zip to extract .7z files.
98 It is possible to implement custom file formats by using raw filter
99 mode in liblzma. In this mode the application needs to store the filter
100 properties and provide them to liblzma before starting to uncompress
104 Q: How can I identify files containing LZMA compressed data?
106 A: The preferred filename suffix for .lzma files is `.lzma'. `.tar.lzma'
107 may be abbreviated to `.tlz'. The same suffixes are used for files in
108 LZMA_Alone format. In practice this should be no problem since tools
109 included in LZMA Utils support both formats transparently.
111 Checking the magic bytes is easy way to detect files in the new .lzma
112 format (the first six bytes: 0xFF 'L' 'Z' 'M' 'A' 0x00). The "file"
113 command version FIXME contains magic strings for this format.
115 The old LZMA_Alone format has no magic bytes. Its header cannot contain
116 arbitrary bytes, thus it is possible to make a guess. Unfortunately the
117 guessing is usually too hard to be reliable, so don't try it unless you
121 Q: Does the lzma command line tool support sparse files?
123 A: Sparse files can (of course) be compressed like normal files, but
124 uncompression will not restore sparseness of the file. Use an archiver
125 tool to take care of sparseness before compressing the data with lzma.
127 The reason for this is that archiver tools handle files, while
128 compression tools handle streams or buffers. Being a sparse file is
129 a property of the file on the disk, not a property of the stream or
133 Q: Can I recover parts of a broken LZMA file (e.g. corrupted CD-R)?
135 A: With LZMA_Alone and single-block .lzma files, you can uncompress the
136 file until you hit the first broken byte. The data after the broken
137 position is lost. LZMA relies on the uncompression history, and if
138 bytes are missing in the middle of the file, it is impossible to
139 reliably continue after the broken section.
141 With multi-block .lzma files it may be possible to locale the next
142 block in the file and continue decoding there. A limited recovery
143 tool for this kind of situations is planned.
148 A: No, the authors are not aware of any patents that could affect LZMA.
149 However, due to nature of software patents, the authors cannot
150 guarantee, that LZMA isn't affected by any third party patent.
153 Q: Where can I find documentation about how LZMA works as an algorithm?
155 A: Read the source code, Luke. There is no documentation about LZMA
156 internals. It is possible that Igor Pavlov is the only person on
157 the Earth that completely knows and understands the algorithm.
159 You could begin by downloading LZMA SDK, and start reading from
160 the LZMA decoder to get some idea about the bitstream format.
161 Before you begin, you should know the basics of LZ77 and
162 range coding algorithms. LZMA is based on LZ77, but LZMA is
163 *a lot* more complex. Range coding is used to compress the
164 final bitstream like Huffman coding is used in Deflate.
169 A: In context of .lzma files, a filter means an implementation of a
170 compression algorithm. The primary filter is LZMA, which is why
171 the names of the tools contain the letters LZMA.
173 liblzma and the new .lzma format support also other filters than LZMA.
174 There are different types of filters, which are suitable for different
175 types of data. Thus, to select the optimal filter and settings, the
176 type of the input data being compressed needs to be known.
178 Some filters are most useful when combined with another filter like
179 LZMA. These filters increase redundancy in the data, without changing
180 the size of the data, by taking advantage of properties specific to
181 the data being compressed.
183 So far, all the filters are always reversible. That is, no matter what
184 data you pass to a filter encoder, it can be always defiltered back to
185 the original form. Because of this, it is safe to compress for example
186 a software package that contains other file types than executables
187 using a filter specific to the architechture of the package being
190 The old LZMA_Alone format supports only the LZMA filter.
193 Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
195 A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,
196 because it requires using more than one encoded output stream.
199 Q: Can I use LZMA in proprietary, non-free applications?
201 A: liblzma is under the GNU LGPL version 2.1 or (at your opinion) any
202 later version. To summarise (*NOTE* This summary is not legally
203 binding, that is, it doesn't give you any extra permissions compared
204 to the LGPL. Read the GNU LGPL carefully for the exact license
206 * All the changes made into the library itself must be published
207 under the same license.
208 * End users must be able to replace the used liblzma. Easiest way
209 to assure this is to link dynamically against liblzma so users
210 can replace the shared library file if they want.
211 * You must make it clear to your users, that your application uses
212 liblzma, and that liblzma is free software under the GNU LGPL.
213 A copy of GNU LGPL must be included.
215 LZMA SDK contains a special exception which allows linking *unmodified*
216 code statically with a non-free application. This exception does *not*
219 As an alternative, you can support the development of LZMA and 7-Zip
220 by buying a proprietary license from Igor Pavlov. See homepage of
221 LZMA SDK <http://7-zip.org/sdk.html> for more information. Note that
222 having a proprietary license from Igor Pavlov doesn't allow you to use
223 liblzma in a way that contradicts with the GNU LGPL, because liblzma
224 contains code that is not copyrighted by Igor Pavlov. Please contact
225 both Lasse Collin and Igor Pavlov if the license conditions of liblzma
226 are not suitable for you.
229 Q: I would like to help. What can I do?
231 A: See the TODO file. Please contact Lasse Collin before starting to do
232 anything, because it is possible that someone else is already working
236 Q: How can I contact the authors?
238 A: Lasse Collin is the maintainer of LZMA Utils. You can contact him
239 either via IRC (Larhzu on #tukaani at Freenode or IRCnet). Email
240 should work too, <lasse.collin@tukaani.org>.
242 Igor Pavlov is the father of LZMA. He is the author of 7-Zip
243 and LZMA SDK. <http://7-zip.org/>
245 NOTE: Please don't bother Igor Pavlov with questions specific