7 This document discusses how to use liblzma securely. There are issues
8 that don't apply to zlib or libbzip2, so reading this document is
9 strongly recommended even for those who are very familiar with zlib
12 While making liblzma itself as secure as possible is essential, it's
13 out of scope of this document.
18 The memory usage of liblzma varies a lot.
25 The memory requirements of Block encoder depend on the used filters
26 and their settings. The memory requirements of the Block decoder
27 depend on the which filters and with which filter settings the Block
28 was encoded. Usually the memory requirements of a decoder are equal
29 or less than the requirements of the encoder with the same settings.
31 While the typical memory requirements to decode a Block is from a few
32 hundred kilobytes to tens of megabytes, a maliciously constructed
33 files can require a lot more RAM to decode. With the current filters,
34 the maximum amount is about 7 GiB. If you use multi-threaded decoding,
35 every Block can require this amount of RAM, thus a four-threaded
36 decoder could suddenly try to allocate 28 GiB of RAM.
38 If you don't limit the maximum memory usage in any way, and there are
39 no resource limits set on the operating system side, one malicious
40 input file can run the system out of memory, or at least make it swap
41 badly for a long time. This is exceptionally bad on servers e.g.
42 email server doing virus scanning on incoming messages.
45 1.1.2. Metadata decoder
47 Multi-Block .lzma files contain at least one Metadata Block.
48 Externally the Metadata Blocks are similar to Data Blocks, so all
49 the issues mentioned about memory usage of Data Blocks applies to
52 The uncompressed content of Metadata Blocks contain information about
53 the Stream as a whole, and optionally some Extra Records. The
54 information about the Stream is kept in liblzma's internal data
55 structures in RAM. Extra Records can contain arbitrary data. They are
56 not interpreted by liblzma, but liblzma will provide them to the
57 application in uninterpreted form if the application wishes so.
59 Usually the Uncompressed Size of a Metadata Block is small. Even on
60 extreme cases, it shouldn't be much bigger than a few megabytes. Once
61 the Metadata has been parsed into native data structures in liblzma,
62 it usually takes a little more memory than in the encoded form. For
63 all normal files, this is no problem, since the resulting memory usage
66 The problem is that a maliciously constructed Metadata Block can
67 contain huge amount of "information", which liblzma will try to store
68 in its internal data structures. This may cause liblzma to allocate
69 all the available RAM unless some kind of resource usage limits are
72 Note that the Extra Records in Metadata are always parsed but, but
73 memory is allocated for them only if the application has requested
74 liblzma to provide the Extra Records to the application.
79 If you need to decode files from untrusted sources (most people do),
80 you must limit the memory usage to avoid denial of service (DoS)
81 conditions caused by malicious input files.
83 The first step is to find out how much memory you are allowed consume
84 at maximum. This may be a hardcoded constant or derived from the
85 available RAM; whatever is appropriate in the application.
87 The simplest solution is to use setrlimit() if the kernel supports
88 RLIMIT_AS, which limits the memory usage of the whole process.
89 For more portable and fine-grained limiting, you can use
90 memory limiter functions found from <lzma/memlimit.h>.
95 lzma_memory_usage() will give you a rough estimate about the memory
96 usage of the given filter chain. To dramatically simplify the internal
97 implementation, this function doesn't take into account all the small
98 helper data structures needed in various places; only the structures
99 with significant memory usage are taken into account. Still, the
100 accuracy of this function should be well within a mebibyte.
102 The Subblock filter is a special case. If a Subfilter has been
103 specified, it isn't taken into account when lzma_memory_usage()
104 calculates the memory usage. You need to calculate the memory usage
105 of the Subfilter separately.
107 Keeping track of Blocks in a Multi-Block Stream takes a few dozen
108 bytes of RAM per Block (size of the lzma_index structure plus overhead
109 of malloc()). It isn't a good idea to put tens of thousands of Blocks
110 into a Stream unless you have a very good reason to do so (compressed
111 dictionary could be an example of such situation).
113 Also keep the number and sizes of Extra Records sane. If you produce
114 the list of Extra Records automatically from some untrusted source,
115 you should not only validate the content of these Records, but also
121 A single-threaded decoder should simply use a memory limiter and
122 indicate an error if it runs out of memory.
124 Memory-limiting with multi-threaded decoding is tricky. The simple
125 solution is to divide the maximum allowed memory usage with the
126 maximum allowed threads, and give each Block decoder their own
127 independent lzma_memory_limiter. The drawback is that if one Block
128 needs notably more RAM than any other Block, the decoder will run out
129 of memory when in reality there would be plenty of free RAM.
131 An attractive alternative would be using shared lzma_memory_limiter.
132 Depending on the application and the expected type of input, this may
133 either be the best solution or a source of hard-to-repeat problems.
134 Consider the following requirements:
135 - You use a maximum of n threads.
136 - x(i) is the decoder memory requirements of the Block number i
137 in an expected input Stream.
138 - The memory limiter is set to higher value than the sum of n
141 (If you are better at explaining the above conditions, please
142 contribute your improved version.)
144 If the above conditions aren't met, it is possible that the decoding
145 will fail unpredictably. That is, on the same machine using the same
146 settings, the decoding may sometimes succeed and sometimes fail. This
147 is because sometimes threads may run so that the Blocks with highest
148 memory usage are tried to be decoded at the same time.
150 Most .lzma files have all the Blocks encoded with identical settings,
151 or at least the memory usage won't vary dramatically. That's why most
152 multi-threaded decoders probably want to use the simple "separate
153 lzma_memory_limiter for each thread" solution, possibly falling back
154 to single-threaded mode in case the per-thread memory limits aren't
155 enough in multi-threaded mode.
157 FIXME: Memory usage of Stream info.
164 2. Huge uncompressed output
168 Decoding a tiny .lzma file can produce huge amount of uncompressed
169 output. There is an example file of 45 bytes, which decodes to 64 PiB
170 (that's 2^56 bytes). Uncompressing such a file to disk is likely to
171 fill even a bigger disk array. If the data is written to a pipe, it
172 may not fill the disk, but would still take very long time to finish.
174 To avoid denial of service conditions caused by huge amount of
175 uncompressed output, applications using liblzma should use some method
176 to limit the amount of output produced. The exact method depends on
179 All valid .lzma Streams make it possible to find out the uncompressed
180 size of the Stream without actually uncompressing the data. This
181 information is available in at least one of the Metadata Blocks.
182 Once the uncompressed size is parsed, the decoder can verify that
183 it doesn't exceed certain limits (e.g. available disk space).
185 When the uncompressed size is known, the decoder can actively keep
186 track of the amount of output produced so far, and that it doesn't
187 exceed the known uncompressed size. If it does exceed, the file is
188 known to be corrupt and an error should be indicated without
189 continuing to decode the rest of the file.
191 Unfortunately, finding the uncompressed size beforehand is often
192 possible only in non-streamed mode, because the needed information
193 could be in the Footer Metdata Block, which (obviously) is at the
194 end of the Stream. In purely streamed mode decoding, one may need to
195 use some rough arbitrary limits to prevent the problems described in
196 the beginning of this section.
201 Metadata is stored in Metadata Blocks, which are very similar to
202 Data Blocks. Thus, the uncompressed size can be huge just like with
203 Data Blocks. The difference is, that the contents of Metadata Blocks
204 aren't given to the application as is, but parsed by liblzma. Still,
205 reading through a huge Metadata can take very long time, effectively
206 creating a denial of service like piping decoded a Data Block to
207 another process would do.
209 At first it would seem that using a memory limiter would prevent
210 this issue as a side effect. But it does so only if the application
211 requests liblzma to allocate the Extra Records and provide them to
212 the application. If Extra Records aren't requested, they aren't
213 allocated either. Still, the Extra Records are being read through
214 to validate that the Metadata is in proper format.
216 The solution is to limit the Uncompressed Size of a Metadata Block
217 to some relatively large value. This will make liblzma to give an
218 error when the given limit is reached.