7 This directory contains bunch of files to test handling of .lzma files
8 in .lzma decoder implementations. Many of the files have been created
9 by hand with a hex editor, thus there is no better "source code" than
10 the files themselves. All the test files (*.lzma) and this README have
11 been put into the public domain.
16 Good files (good-*.lzma) must decode successfully without requiring
17 a lot of CPU time or RAM. If the decoder supports only Single-Block
18 Streams, then good-multi-*.lzma won't decode, of course.
20 Bad files (bad-*.lzma) must cause the decoder to give an error. Like
21 with the good files, these files must not require a lot of CPU time
22 or RAM before they get detected to be broken.
24 Malicious files (malicious-*.lzma) are good in terms of the file format
25 specification, but try to trigger excessive CPU, RAM or disk usage in
26 the decoder. To prevent malicious files from putting the decoder in
27 inifinite loop (*), eating all available RAM or disk space, decoders
28 should have internal limiters that catch these situations.
30 (*) Strictly speaking not infinite, but if decoding of a small file
31 would take a few weeks or even years, it's an infinite loop in
35 2. Descriptions of Individual Files
39 good-single-none.lzma uses implicit Copy filter with known Uncompressed
42 good-single-none-pad.lzma is good-single-none.lzma with Footer Padding.
44 good-cat-single-none-pad.lzma is two good-single-none-pad.lzma files
45 concatenated as is. Fully decoding this file requires that the decoder
46 supports decoding concatenated files.
48 good-single-subblock_implicit.lzma uses implicit Subblock filter.
50 good-single-lzma.lzma is LZMA compressed file with EOPM.
52 good-single-subblock-lzma.lzma has basic combination of Subblock and
55 good-single-none-empty_1.lzma is an empty file with implicit Copy
56 filter and no integrity Check.
58 good-single-none-empty_2.lzma is an empty file with implicit Copy
59 filter and CRC32 as Check.
61 good-single-none-empty_3.lzma is an empty file with implicit Copy
62 filter, known Compressed Size, and no integrity Check.
64 good-single-lzma-empty.lzma is an empty file with LZMA filter and no
67 good-single-subblock_rle.lzma takes advantage of Subblock filter's
70 good-single-delta-lzma.tiff.lzma is an image file that compresses
71 better with Delta+LZMA than with plain LZMA.
73 good-single-x86-lzma.lzma uses the x86 filter (BCJ) and LZMA. The
74 uncompressed file is compress_prepared_bcj_x86 found from the tests
77 good-single-sparc-lzma.lzma uses the SPARC filter and LZMA. The
78 uncompressed file is compress_prepared_bcj_sparc found from the tests
81 good-single-lzma-flush_1.lzma has a flush marker in the middle of
82 the file, and no EOPM.
84 good-single-lzma-flush_2.lzma has a flush marker in the middle of
85 the file and just before EOPM.
87 good-multi-none-1.lzma is a basic Multi-Block Stream with two Data
88 Blocks and Footer Metadata Block.
90 good-multi-none-2.lzma is good-multi-none-1.lzma with Total Size and
91 Uncompressed Size added to the Footer Metadata Block.
93 good-multi-none-extra_1.lzma has the `Extra is present' flag set but
94 no actual Extra Records.
96 good-multi-none-extra_2.lzma has two non-empty Extra Records.
98 good-multi-none-extra_3.lzma has an Extra Record that has empty Data.
100 good-multi-none-header_1.lzma has very minimal Header Metadata Block
101 with only the Metadata Flags field.
103 good-multi-none-header_2.lzma has all information in both Header and
104 Footer Metadata Blocks. The Size of Header Metadata Block has wrong
105 value in Header Metadata Block, but this value must be ignored by
106 the decoder in case of Header Metadata Block.
108 good-multi-none-header_3.lzma has Index only in the Header Metadata
109 Block. Footer Metadata Block contains only Size of Header Metadata
110 Block and Total Size.
112 good-multi-none-block_1.lzma has Index in Header Metadata Block. The
113 Compressed Size and Uncompressed Size fields are present in the Data
114 Blocks. There is some Footer Padding between the Blocks.
116 good-multi-none-block_2.lzma has Index in Header Metadata Block. The
117 Uncompressed Size field is present in Data Blocks and no EOPM is used.
122 bad-single-none-truncated.lzma is good-single-none.lzma without the
123 last byte of the file.
125 bad-cat-single-none-pad_garbage_1.lzma is good-cat-single-none-pad.lzma
126 with 0xFE appended to the end of the file. 0xFE doesn't begin .lzma
127 or LZMA_Alone format file.
129 bad-cat-single-none-pad_garbage_2.lzma is good-cat-single-none-pad.lzma
130 with 0xFF appended to the end of the file. 0xFF begins .lzma format
131 file, thus the decoder has to detect that the file is incomplete.
133 bad-cat-single-none-pad_garbage_3.lzma is good-cat-single-none-pad.lzma
134 with 0x5D appended to the end of the file. 0x5D is the most common
135 first byte of LZMA_Alone format file.
137 bad-single-none-footer_filter_flags.lzma has different Stream Flags
138 in Stream Footer than in Stream Header.
140 bad-single-none-too_long_vli.lzma has 10-byte variable-length integer.
142 bad-single-none-empty.lzma is like good-single-none-empty_3.lzma but
143 with non-zero value in the Compressed Size field.
145 bad-single-data_after_eopm_1.lzma has LZMA+Subblock, where the Subblock
146 filter gives one byte of data to LZMA after LZMA has detected EOPM.
148 bad-single-data_after_eopm_2.lzma is like
149 bad-single-data_after_eopm_1.lzma but Subblock gives 256 MiB of data
150 to LZMA after LZMA has detected EOPM.
152 bad-single-subblock_subblock.lzma has Subblock+Subblock, where the
153 Subblock decoder is given End of Input in the middle of a Subblock.
155 bad-single-subblock-padding_loop.lzma contains huge amount of
156 consecutive Padding bytes, which isn't allowed by the Subblock filter
157 format. If it were allowed, this file would hang the decoder for very
158 long time (weeks to years).
160 bad-single-subblock1023-slow.lzma is similar to
161 malicious-single-subblock31-slow.lzma except that this uses 1023 bytes
162 of Padding in every place instead of 31 bytes. The Subblock filter
163 format specification allows only 31-byte Padings, thus this file must
164 get detected as bad without producing any output. Allowing larger
165 Padding than 31 bytes was considered (so this test file was created),
166 but it seemed to be a bad idea since it would increase worst-case CPU
169 bad-single-lzma-flush_beginning.lzma has flush marker in the beginning
172 bad-single-lzma-flush_twice.lzma has two flush markers with no data
175 bad-multi-none-1.lzma has data after the last field in the Metadata
176 Block and the `Extra is present' flag is not set.
178 bad-multi-none-2.lzma has wrong Total Size in Footer Metadata Block.
180 bad-multi-none-3.lzma has wrong Uncompressed Size in Footer Metadata
183 bad-multi-none-index_1.lzma has wrong value in the Number of Data
186 bad-multi-none-index_2.lzma has too short Metadata to contain all
189 bad-multi-none-index_3.lzma has wrong value in Total Size field in
192 bad-multi-none-index_4.lzma has wrong value in Uncompressed Size field
195 bad-multi-none-extra_1.lzma has incomplete Extra Record at the end of
198 bad-multi-none-extra_2.lzma has incomplete variable-length integer as
201 bad-multi-none-extra_3.lzma has incomplete Extra Record at the end of
204 bad-multi-none-header_1.lzma has empty Header Metadata Block (even
205 the Metadata Flags field is not present).
207 bad-multi-none-header_2.lzma has Index in the Header Metadata Block,
208 which describes only one Data Block, while the Stream actually has
209 two Data Blocks. A sophisticated decoder should give an error when
210 it detects the second Data Block; all Multi-Block decoders must
211 detect the file as corrupt at some point.
213 bad-multi-none-header_3.lzma contains too small Total Size in Header
214 Metadata Block. A sophisticated decoder should abort decoding before
215 the second Data Block, preferably before the first Data Block has
216 been finished; all Multi-Block decoders must detect the file as
217 corrupt at some point.
219 bad-multi-none-header_4.lzma is like bad-multi-none-header_3.lzma but
220 with too small Uncompressed Size.
222 bad-multi-none-header_5.lzma has Index in the Header Metadata Block,
223 but the Total Size field is missing from the Footer Metadata Block.
225 bad-multi-none-header_6.lzma has both Index and Total Size in Header
226 Metadata Block, but Total Size doesn't match the Index. A sophisticated
227 decoder should abort before decoding any Data Blocks; all Multi-Block
228 decoders must detect the file as corrupt at some point.
230 bad-multi-none-header_7.lzma has zero as the Size of Header Metadata
231 Block in the Header Metadata Block.
233 bad-multi-none-block_1.lzma has wrong Uncompressed Size in the first
234 Data Block. A sophisticated decoder should detect this error before
235 producing any output, because it can see that the Uncompressed Size
236 doesn't match with the Index in Header Metadata Block; all Multi-Block
237 decoders must detect the file as corrupt at some point.
239 bad-multi-none-block_2.lzma has too big Compressed Size in the first
240 Data Block. A sophisticated decoder may be able to detect the file as
241 corrupt before producing any output, because Comrpessed Size + size
242 of Block Header exceed the Total Size stored in Index in Header
243 Metadata Block. A sophisticated decoder should be able to detect the
244 error before the end of the first Data Block; all Multi-Block decoders
245 must detect the file as corrupt at some point.
247 bad-multi-none-block_3.lzma has only the Compressed Size field in the
248 Block Header of the second Data Block and EOPM isn't used.
253 malicious-single-subblock31-slow.lzma requires quite a bit of CPU time
254 per decoded byte. It contains LZMA compressed Subblock filter data that
255 has as much Padding as the specification allows. LZMA is also used as
256 a Subfilter, to further slowdown the decoder. Every Subfilter instance
257 produces only one byte of output. If you can create a file that wastes
258 notably more CPU cycles than this file, please contact Lasse Collin.
260 malicious-single-subblock-256MiB.lzma is a tiny file that produces
261 256 MiB of output. It uses Subblock filter's run-length encoding
264 malicious-single-subblock-64PiB.lzma is a tiny file that produces
265 64 PiB of output (if you have patience to wait). This is done by
266 chaining two Subblock filters and using their run-length encoders.
268 malicious-multi-metadata-64PiB.lzma is like
269 malicious-single-subblock-64PiB.lzma but the huge amount of output
270 is in a Metadata Block. Trying to decode this file may take years
271 unless the decoder catches that the Metadata has unreasonable size.