doc/liblzma-security.txt

   1
   2 Using liblzma securely
   3 ----------------------
   4
   5 0. Introduction
   6
   7     This document discusses how to use liblzma securely. There are issues
   8     that don't apply to zlib or libbzip2, so reading this document is
   9     strongly recommended even for those who are very familiar with zlib
  10     or libbzip2.
  11
  12     While making liblzma itself as secure as possible is essential, it's
  13     out of scope of this document.
  14
  15
  16 1. Memory usage
  17
  18     The memory usage of liblzma varies a lot.
  19
  20
  21 1.1. Problem sources
  22
  23 1.1.1. Block coder
  24
  25     The memory requirements of Block encoder depend on the used filters
  26     and their settings. The memory requirements of the Block decoder
  27     depend on the which filters and with which filter settings the Block
  28     was encoded. Usually the memory requirements of a decoder are equal
  29     or less than the requirements of the encoder with the same settings.
  30
  31     While the typical memory requirements to decode a Block is from a few
  32     hundred kilobytes to tens of megabytes, a maliciously constructed
  33     files can require a lot more RAM to decode. With the current filters,
  34     the maximum amount is about 7 GiB. If you use multi-threaded decoding,
  35     every Block can require this amount of RAM, thus a four-threaded
  36     decoder could suddenly try to allocate 28 GiB of RAM.
  37
  38     If you don't limit the maximum memory usage in any way, and there are
  39     no resource limits set on the operating system side, one malicious
  40     input file can run the system out of memory, or at least make it swap
  41     badly for a long time. This is exceptionally bad on servers e.g.
  42     email server doing virus scanning on incoming messages.
  43
  44
  45 1.1.2. Metadata decoder
  46
  47     Multi-Block .lzma files contain at least one Metadata Block.
  48     Externally the Metadata Blocks are similar to Data Blocks, so all
  49     the issues mentioned about memory usage of Data Blocks applies to
  50     Metadata Blocks too.
  51
  52     The uncompressed content of Metadata Blocks contain information about
  53     the Stream as a whole, and optionally some Extra Records. The
  54     information about the Stream is kept in liblzma's internal data
  55     structures in RAM. Extra Records can contain arbitrary data. They are
  56     not interpreted by liblzma, but liblzma will provide them to the
  57     application in uninterpreted form if the application wishes so.
  58
  59     Usually the Uncompressed Size of a Metadata Block is small. Even on
  60     extreme cases, it shouldn't be much bigger than a few megabytes. Once
  61     the Metadata has been parsed into native data structures in liblzma,
  62     it usually takes a little more memory than in the encoded form. For
  63     all normal files, this is no problem, since the resulting memory usage
  64     won't be too much.
  65
  66     The problem is that a maliciously constructed Metadata Block can
  67     contain huge amount of "information", which liblzma will try to store
  68     in its internal data structures. This may cause liblzma to allocate
  69     all the available RAM unless some kind of resource usage limits are
  70     applied.
  71
  72     Note that the Extra Records in Metadata are always parsed but, but
  73     memory is allocated for them only if the application has requested
  74     liblzma to provide the Extra Records to the application.
  75
  76
  77 1.2. Solutions
  78
  79     If you need to decode files from untrusted sources (most people do),
  80     you must limit the memory usage to avoid denial of service (DoS)
  81     conditions caused by malicious input files.
  82
  83     The first step is to find out how much memory you are allowed consume
  84     at maximum. This may be a hardcoded constant or derived from the
  85     available RAM; whatever is appropriate in the application.
  86
  87     The simplest solution is to use setrlimit() if the kernel supports
  88     RLIMIT_AS, which limits the memory usage of the whole process.
  89     For more portable and fine-grained limiting, you can use
  90     memory limiter functions found from <lzma/memlimit.h>.
  91
  92
  93 1.2.1. Encoder
  94
  95     lzma_memory_usage() will give you a rough estimate about the memory
  96     usage of the given filter chain. To dramatically simplify the internal
  97     implementation, this function doesn't take into account all the small
  98     helper data structures needed in various places; only the structures
  99     with significant memory usage are taken into account. Still, the
 100     accuracy of this function should be well within a mebibyte.
 101
 102     The Subblock filter is a special case. If a Subfilter has been
 103     specified, it isn't taken into account when lzma_memory_usage()
 104     calculates the memory usage. You need to calculate the memory usage
 105     of the Subfilter separately.
 106
 107     Keeping track of Blocks in a Multi-Block Stream takes a few dozen
 108     bytes of RAM per Block (size of the lzma_index structure plus overhead
 109     of malloc()). It isn't a good idea to put tens of thousands of Blocks
 110     into a Stream unless you have a very good reason to do so (compressed
 111     dictionary could be an example of such situation).
 112
 113     Also keep the number and sizes of Extra Records sane. If you produce
 114     the list of Extra Records automatically from some untrusted source,
 115     you should not only validate the content of these Records, but also
 116     their memory usage.
 117
 118
 119 1.2.2. Decoder
 120
 121     A single-threaded decoder should simply use a memory limiter and
 122     indicate an error if it runs out of memory.
 123
 124     Memory-limiting with multi-threaded decoding is tricky. The simple
 125     solution is to divide the maximum allowed memory usage with the
 126     maximum allowed threads, and give each Block decoder their own
 127     independent lzma_memory_limiter. The drawback is that if one Block
 128     needs notably more RAM than any other Block, the decoder will run out
 129     of memory when in reality there would be plenty of free RAM.
 130
 131     An attractive alternative would be using shared lzma_memory_limiter.
 132     Depending on the application and the expected type of input, this may
 133     either be the best solution or a source of hard-to-repeat problems.
 134     Consider the following requirements:
 135       - You use a maximum of n threads.
 136       - x(i) is the decoder memory requirements of the Block number i
 137         in an expected input Stream.
 138       - The memory limiter is set to higher value than the sum of n
 139         highest values x(i).
 140
 141     (If you are better at explaining the above conditions, please
 142     contribute your improved version.)
 143
 144     If the above conditions aren't met, it is possible that the decoding
 145     will fail unpredictably. That is, on the same machine using the same
 146     settings, the decoding may sometimes succeed and sometimes fail. This
 147     is because sometimes threads may run so that the Blocks with highest
 148     memory usage are tried to be decoded at the same time.
 149
 150     Most .lzma files have all the Blocks encoded with identical settings,
 151     or at least the memory usage won't vary dramatically. That's why most
 152     multi-threaded decoders probably want to use the simple "separate
 153     lzma_memory_limiter for each thread" solution, possibly falling back
 154     to single-threaded mode in case the per-thread memory limits aren't
 155     enough in multi-threaded mode.
 156
 157 FIXME: Memory usage of Stream info.
 158
 159 [
 160
 161 ]
 162
 163
 164 2. Huge uncompressed output
 165
 166 2.1. Data Blocks
 167
 168     Decoding a tiny .lzma file can produce huge amount of uncompressed
 169     output. There is an example file of 45 bytes, which decodes to 64 PiB
 170     (that's 2^56 bytes). Uncompressing such a file to disk is likely to
 171     fill even a bigger disk array. If the data is written to a pipe, it
 172     may not fill the disk, but would still take very long time to finish.
 173
 174     To avoid denial of service conditions caused by huge amount of
 175     uncompressed output, applications using liblzma should use some method
 176     to limit the amount of output produced. The exact method depends on
 177     the application.
 178
 179     All valid .lzma Streams make it possible to find out the uncompressed
 180     size of the Stream without actually uncompressing the data. This
 181     information is available in at least one of the Metadata Blocks.
 182     Once the uncompressed size is parsed, the decoder can verify that
 183     it doesn't exceed certain limits (e.g. available disk space).
 184
 185     When the uncompressed size is known, the decoder can actively keep
 186     track of the amount of output produced so far, and that it doesn't
 187     exceed the known uncompressed size. If it does exceed, the file is
 188     known to be corrupt and an error should be indicated without
 189     continuing to decode the rest of the file.
 190
 191     Unfortunately, finding the uncompressed size beforehand is often
 192     possible only in non-streamed mode, because the needed information
 193     could be in the Footer Metdata Block, which (obviously) is at the
 194     end of the Stream. In purely streamed mode decoding, one may need to
 195     use some rough arbitrary limits to prevent the problems described in
 196     the beginning of this section.
 197
 198
 199 2.2. Metadata
 200
 201     Metadata is stored in Metadata Blocks, which are very similar to
 202     Data Blocks. Thus, the uncompressed size can be huge just like with
 203     Data Blocks. The difference is, that the contents of Metadata Blocks
 204     aren't given to the application as is, but parsed by liblzma. Still,
 205     reading through a huge Metadata can take very long time, effectively
 206     creating a denial of service like piping decoded a Data Block to
 207     another process would do.
 208
 209     At first it would seem that using a memory limiter would prevent
 210     this issue as a side effect. But it does so only if the application
 211     requests liblzma to allocate the Extra Records and provide them to
 212     the application. If Extra Records aren't requested, they aren't
 213     allocated either. Still, the Extra Records are being read through
 214     to validate that the Metadata is in proper format.
 215
 216     The solution is to limit the Uncompressed Size of a Metadata Block
 217     to some relatively large value. This will make liblzma to give an
 218     error when the given limit is reached.
 219