2 Introduction to liblzma
3 -----------------------
5 Writing applications to work with liblzma
7 liblzma API is split in several subheaders to improve readability and
8 maintainance. The subheaders must not be #included directly. lzma.h
9 requires that certain integer types and macros are available when
10 the header is #included. On systems that have inttypes.h that conforms
11 to C99, the following will work:
13 #include <sys/types.h>
17 Those who have used zlib should find liblzma's API easy to use.
18 To developers who haven't used zlib before, I recommend learning
19 zlib first, because zlib has excellent documentation.
21 While the API is similar to that of zlib, there are some major
22 differences, which are summarized below.
24 For basic stream encoding, zlib has three functions (deflateInit(),
25 deflate(), and deflateEnd()). Similarly, there are three functions
26 for stream decoding (inflateInit(), inflate(), and inflateEnd()).
27 liblzma has only single coding and ending function. Thus, to
28 encode one may use, for example, lzma_stream_encoder_single(),
29 lzma_code(), and lzma_end(). Simlarly for decoding, one may
30 use lzma_auto_decoder(), lzma_code(), and lzma_end().
32 zlib has deflateReset() and inflateReset() to reset the stream
33 structure without reallocating all the memory. In liblzma, all
34 coder initialization functions are like zlib's reset functions:
35 the first-time initializations are done with the same functions
36 as the reinitializations (resetting).
38 To make all this work, liblzma needs to know when lzma_stream
39 doesn't already point to an allocated and initialized coder.
40 This is achieved by initializing lzma_stream structure with
41 LZMA_STREAM_INIT (static initialization) or LZMA_STREAM_INIT_VAR
42 (for exampple when new lzma_stream has been allocated with malloc()).
43 This initialization should be done exactly once per lzma_stream
44 structure to avoid leaking memory. Calling lzma_end() will leave
45 lzma_stream into a state comparable to the state achieved with
46 LZMA_STREAM_INIT and LZMA_STREAM_INIT_VAR.
48 Example probably clarifies a lot. With zlib, compression goes
52 deflateInit(&strm, level);
53 deflate(&strm, Z_RUN);
54 deflate(&strm, Z_RUN);
56 deflate(&strm, Z_FINISH);
57 deflateEnd(&strm) or deflateReset(&strm)
59 With liblzma, it's slightly different:
61 lzma_stream strm = LZMA_STREAM_INIT;
62 lzma_stream_encoder_single(&strm, &options);
63 lzma_code(&strm, LZMA_RUN);
64 lzma_code(&strm, LZMA_RUN);
66 lzma_code(&strm, LZMA_FINISH);
67 lzma_end(&strm) or reinitialize for new coding work
69 Reinitialization in the last step can be any function that can
70 initialize lzma_stream; it doesn't need to be the same function
71 that was used for the previous initialization. If it is the same
72 function, liblzma will usually be able to re-use most of the
73 existing memory allocations (depends on how much the initialization
74 options change). If you reinitialize with different function,
75 liblzma will automatically free the memory of the previous coder.
80 liblzma supports multiple container formats for the compressed data.
81 Different initialization functions initialize the lzma_stream to
82 process different container formats. See the details from the public
85 The following functions are the most commonly used:
87 - lzma_stream_encoder_single(): Encodes Single-Block Stream; this
88 the recommended format for most purporses.
90 - lzma_alone_encoder(): Useful if you need to encode into the
91 legacy LZMA_Alone format.
93 - lzma_auto_decoder(): Decoder that automatically detects the
94 file format; recommended when you decode compressed files on
95 disk, because this way compatibility with the legacy LZMA_Alone
96 format is transparent.
98 - lzma_stream_decoder(): Decoder for Single- and Multi-Block
99 Streams; this is good if you want to accept only .lzma Streams.
104 liblzma supports multiple filters (algorithm implementations). The new
105 .lzma format supports filter-chain having up to seven filters. In the
106 filter chain, the output of one filter is input of the next filter in
107 the chain. The legacy LZMA_Alone format supports only one filter, and
108 that must always be LZMA.
110 General-purporse compression:
112 LZMA The main algorithm of liblzma (surprise!)
114 Branch/Call/Jump filters for executables:
116 x86 This filter is known as BCJ in 7-Zip
118 PowerPC Big endian PowerPC
125 Copy Dummy filter that simply copies all the data
126 from input to output.
128 Subblock Multi-purporse filter, that can
129 - embed End of Payload Marker if the previous
130 filter in the chain doesn't support it; and
131 - apply Subfilters, which filter only part
132 of the same compressed Block in the Stream.
134 Branch/Call/Jump filters never change the size of the data. They
135 should usually be used as a pre-filter for some compression filter
141 The .lzma Stream format uses CRC32 as the integrity check for
142 different file format headers. It is possible to omit CRC32 from
143 the Block Headers, but not from Stream Header. This is the reason
144 why CRC32 code cannot be disabled when building liblzma (in addition,
145 the LZMA encoder uses CRC32 for hashing, so that's another reason).
147 The integrity check of the actual data is calculated from the
148 uncompressed data. This check can be CRC32, CRC64, or SHA256.
149 It can also be omitted completely, although that usually is not
150 a good thing to do. There are free IDs left, so support for new
151 checks algorithms can be added later.
154 API and ABI stability
156 The API and ABI of liblzma isn't stable yet, although no huge
157 changes should happen. One potential place for change is the
158 lzma_options_subblock structure.
160 In the 4.42.0alpha phase, the shared library version number won't
161 be updated even if ABI breaks. I don't want to track the ABI changes
162 yet. Just rebuild everything when you upgrade liblzma until we get
168 While liblzma isn't huge, it is quite far from the smallest possible
169 LZMA implementation: full liblzma binary (with support for all
170 filters and other features) is way over 100 KiB, but the plain raw
171 LZMA decoder is only 5-10 KiB.
173 To decrease the size of the library, you can omit parts of the library
174 by passing certain options to the `configure' script. Disabling
175 everything but the decoders of the require filters will usually give
176 you a small enough library, but if you need a decoder for example
177 embedded in the operating system kernel, the code from liblzma probably
178 isn't suitable as is.
180 If you need a minimal implementation supporting .lzma Streams, you
181 may need to do partial rewrite. liblzma uses stateful API like zlib.
182 That increases the size of the library. Using callback API or even
183 simpler buffer-to-buffer API would allow smaller implementation.
185 LZMA SDK contains smaller LZMA decoder written in ANSI-C than
186 liblzma, so you may want to take a look at that code. However,
187 it doesn't (at least not yet) support the new .lzma Stream format.
192 There's no other documentation than the public headers and this
193 text yet. Real docs will be written some day, I hope.