9 1.1. Byte and Its Representation
10 1.2. Multibyte Integers
13 2.1.1. Single-Block Stream
14 2.1.2. Multi-Block Stream
16 2.2.1. Header Magic Bytes
22 3.1.2. Compressed Size
23 3.1.3. Uncompressed Size
24 3.1.4. List of Filter Flags
27 3.1.4.3. External Size of Properties
28 3.1.4.4. Filter Properties
35 3.3.2.1. Uncompressed Size
36 3.3.2.2. Backward Size
38 3.3.2.4. Footer Magic Bytes
41 4.1. Detecting when All Data Has Been Decoded
42 4.1.1. With Uncompressed Size
43 4.1.2. With End of Input
44 4.1.3. With End of Payload Marker
49 4.3.2.1. Format of the Encoded Output
51 4.3.3.1. Format of the Encoded Output
53 4.3.4.1. LZMA Properties
54 4.3.4.2. Dictionary Flags
55 4.3.5. Branch/Call/Jump Filters for Executables
58 5.2. Size of Header Metadata Block
60 5.4. Uncompressed Size
62 5.5.1. Number of Data Blocks
64 5.5.3. Uncompressed Sizes
66 5.6.1. 0x00: Dummy/Padding
67 5.6.2. 0x01: OpenPGP Signature
68 5.6.3. 0x02: Filter Information
70 5.6.5. 0x04: List of Checks
71 5.6.6. 0x05: Original Filename
72 5.6.7. 0x07: Modification Time
73 5.6.8. 0x09: High-Resolution Modification Time
74 5.6.9. 0x0B: MIME Type
75 5.6.10. 0x0D: Homepage URL
76 6. Custom Filter and Extra Record IDs
77 6.1. Reserved Custom Filter ID Ranges
78 7. Cyclic Redundancy Checks
80 8.1. Normative References
81 8.2. Informative References
86 This document describes the .lzma file format (filename suffix
87 `.lzma', MIME type `application/x-lzma'). It is intended that
88 this format replace the format used by the LZMA_Alone tool
89 included in LZMA SDK up to and including version 4.57.
91 IMPORTANT: The version described in this document is a
92 draft, NOT a final, official version. Changes
96 0.1. Copyright Notices
98 Copyright (C) 2006, 2007 Lasse Collin <lasse.collin@tukaani.org>
99 Copyright (C) 2006 Ville Koskinen <w-ber@iki.fi>
101 Copying and distribution of this file, with or without
102 modification, are permitted in any medium without royalty
103 provided the copyright notice and this notice are preserved.
104 Modified versions must be marked as such.
106 All source code examples given in this document are put into
107 the public domain by the authors of this document.
109 Thanks for helping with this document goes to Igor Pavlov,
110 Mark Adler and Mikko Pouru.
115 Last modified: 2008-02-01 19:25+0200
117 (A changelog will be kept once the first official version
123 The keywords `must', `must not', `required', `should',
124 `should not', `recommended', `may', and `optional' in this
125 document are to be interpreted as described in [RFC-2119].
126 These words are not capitalized in this document.
128 Indicating a warning means displaying a message, returning
129 appropriate exit status, or something else to let the user
130 know that something worth warning occurred. The operation
131 should still finish if a warning is indicated.
133 Indicating an error means displaying a message, returning
134 appropriate exit status, or something else to let the user
135 know that something prevented successfully finishing the
136 operation. The operation must be aborted once an error has
140 1.1. Byte and Its Representation
142 In this document, byte is always 8 bits.
144 A `nul byte' has all bits unset. That is, the value of a nul
147 To represent byte blocks, this document uses notation that
148 is similar to the notation used in [RFC-1952]:
155 | Foo | Two bytes; that is, some of the vertical bars
156 +---+---+ can be missing.
159 | Foo | Zero or more bytes.
162 In this document, a boxed byte or a byte sequence declared
163 using this notation is called `a field'. The example field
164 above would be called called `the Foo field' or plain `Foo'.
167 1.2. Multibyte Integers
169 Multibyte integers of static length, such as CRC values,
170 are stored in little endian byte order (least significant
173 When smaller values are more likely than bigger values (e.g.
174 file sizes), multibyte integers are encoded in a simple
175 variable-length representation:
176 - Numbers in the range [0, 127] are copied as is, and take
178 - Bigger numbers will occupy two or more bytes. The lowest
179 seven bits of every byte are used for data; the highest
180 (eighth) bit indicates either that
181 0) the byte is in the middle of the byte sequence, or
182 1) the byte is the first or the last byte.
184 For now, the value of the variable-length integers is limited
185 to 63 bits, which limits the encoded size of the integer to
186 nine bytes. These limits may be increased in future if needed.
188 Note that the encoding is not as optimal as it could be. For
189 example, it is possible to encode the number 42 using any
190 number of bytes between one and nine. This is convenient
191 for non-streamed encoders, that write Compressed Size or
192 Uncompressed Size fields to the Block Header (see Section 3.1)
193 after the Compressed Data field is written to the disk.
195 In several situations, the decoder needs to compare that two
196 fields contain identical information. When comparing fields
197 using the encoding described in this Section, the decoder must
198 consider two fields identical if their decoded values are
199 identical; it does not matter if the encoded variable-length
200 representations differ.
202 The following C code illustrates encoding and decoding 63-bit
203 variables; the highest bit of uint64_t must be unset. The
204 functions return the number of bytes occupied by the integer
205 (1-9), or zero on error.
207 #include <sys/types.h>
208 #include <inttypes.h>
211 encode(uint8_t buf[static 9], uint64_t num)
213 if (num >= (UINT64_C(1) << (9 * 7)))
219 buf[0] = (num & 0x7F) | 0x80;
222 while (num >= 0x80) {
223 buf[i++] = num & 0x7F;
226 buf[i++] = num | 0x80;
231 decode(const uint8_t buf[], size_t size_max, uint64_t *num)
237 *num = buf[0] & 0x7F;
238 if (!(buf[0] & 0x80))
244 *num |= (uint64_t)(buf[i] & 0x7F) << (7 * i);
245 } while (!(buf[i++] & 0x80));
250 decode_reverse(const uint8_t buf[], size_t size_max,
255 const size_t end = size_max > 9 ? size_max - 9 : 0;
256 size_t i = size_max - 1;
257 *num = buf[i] & 0x7F;
258 if (!(buf[i] & 0x80))
264 *num |= buf[i] & 0x7F;
265 } while (!(buf[i] & 0x80));
272 +========+========+========+
273 | Stream | Stream | Stream | ...
274 +========+========+========+
276 A file contains usually only one Stream. However, it is
277 possible to concatenate multiple Streams together with no
278 additional processing. It is up to the implementation to
279 decide if the decoder will continue decoding from the next
280 Stream once the end of the first Stream has been reached.
285 There are two types of Streams: Single-Block Streams and
286 Multi-Block Streams. Decoders conforming to this specification
287 must support at least Single-Block Streams. Supporting
288 Multi-Block Streams is optional. If the decoder supports only
289 Single-Block Streams, the documentation of the decoder should
290 mention this fact clearly.
293 2.1.1. Single-Block Stream
295 +===============+============+
296 | Stream Header | Data Block |
297 +===============+============+
299 As the name says, a Single-Block Stream has exactly one Block.
300 The Block must be a Data Block; Metadata Blocks are not allowed
301 in Single-Block Streams.
304 2.1.2. Multi-Block Stream
306 +===============+=======================+
307 | Stream Header | Header Metadata Block |
308 +===============+=======================+
310 +============+ +============+=======================+
311 ---> | Data Block | ... | Data Block | Footer Metadata Block |
312 +============+ +============+=======================+
315 - Stream Header is mandatory.
316 - Header Metadata Block is optional.
317 - Each Multi-Block Stream has at least one Data Block. The
318 maximum number of Data Blocks is not limited.
319 - Footer Metadata Block is mandatory.
324 +---+---+---+---+---+---+--------------+--+--+--+--+
325 | Header Magic Bytes | Stream Flags | CRC32 |
326 +---+---+---+---+---+---+--------------+--+--+--+--+
329 2.2.1. Header Magic Bytes
331 The first six (6) bytes of the Stream are so called Header
332 Magic Bytes. They can be used to identify the file type.
334 Using a C array and ASCII:
335 const uint8_t HEADER_MAGIC[6]
336 = { 0xFF, 'L', 'Z', 'M', 'A', 0x00 };
338 In plain hexadecimal:
342 - The first byte (0xFF) was chosen so that the files cannot
343 be erroneously detected as being in LZMA_Alone format, in
344 which the first byte is in the the range [0x00, 0xE0].
345 - The sixth byte (0x00) was chosen to prevent applications
346 from misdetecting the file as a text file.
351 Bit(s) Mask Description
352 0-2 0x07 Type of Check (see Section 3.3.1):
356 0x02 4 bytes (Reserved)
358 0x04 16 bytes (Reserved)
359 0x05 32 bytes SHA-256
360 0x06 32 bytes (Reserved)
361 0x07 64 bytes (Reserved)
362 3 0x08 The CRC32 field is present in Block Headers.
363 4 0x10 If unset, this is a Single-Block Stream; if set,
364 this is a Multi-Block Stream.
365 5-7 0xE0 Reserved for future use; must be zero for now.
367 Implementations must support at least the Check IDs 0x00 (None)
368 and 0x01 (CRC32). Supporting other Check IDs is optional. If an
369 unsupported Check is used, the decoder must indicate a warning
372 If any reserved bit is set, the decoder must indicate an error.
373 It is possible that there is a new field present which the
374 decoder is not aware of, and can thus parse the Stream Header
380 The CRC32 is calculated from the Stream Flags field. It is
381 stored as an unsigned 32-bit little endian integer. If the
382 calculated value does not match the stored one, the decoder
383 must indicate an error.
385 Note that this field is always present; the bit in Stream Flags
386 controls only presence of CRC32 in Block Headers.
391 +==============+=================+==============+
392 | Block Header | Compressed Data | Block Footer |
393 +==============+=================+==============+
395 There are two types of Blocks:
396 - Data Blocks hold the actual compressed data.
397 - Metadata Blocks hold the Index, Extra, and a few other
398 non-data fields (see Section 5).
400 The type of the Block is indicated by the corresponding bit
401 in the Block Flags field (see Section 3.1.1).
406 +------+------+=================+===================+
407 | Block Flags | Compressed Size | Uncompressed Size |
408 +------+------+=================+===================+
410 +======================+--+--+--+--+================+
411 ---> | List of Filter Flags | CRC32 | Header Padding |
412 +======================+--+--+--+--+================+
417 The first byte of the Block Flags field is a bit field:
419 Bit(s) Mask Description
420 0-2 0x07 Number of filters (0-7)
421 3 0x08 Use End of Payload Marker (even if
422 Uncompressed Size is stored to Block Header).
423 4 0x10 The Compressed Size field is present.
424 5 0x20 The Uncompressed Size field is present.
425 6 0x40 Reserved for future use; must be zero for now.
426 7 0x80 This is a Metadata Block.
428 The second byte of the Block Flags field is also a bit field:
430 Bit(s) Mask Description
431 0-4 0x1F Size of the Header Padding field (0-31 bytes)
432 5-7 0xE0 Reserved for future use; must be zero for now.
434 The decoder must indicate an error if End of Payload Marker
435 is not used and Uncompressed Size is not stored to the Block
436 Header. Because of this, the first byte of Block Flags can
437 never be a nul byte. This is useful when detecting beginning
438 of the Block after Footer Padding (see Section 3.3.3).
440 If any reserved bit is set, the decoder must indicate an error.
441 It is possible that there is a new field present which the
442 decoder is not aware of, and can thus parse the Block Header
446 3.1.2. Compressed Size
448 This field is present only if the appropriate bit is set in
449 the Block Flags field (see Section 3.1.1).
451 This field contains the size of the Compressed Data field.
452 The size is stored using the encoding described in Section 1.2.
453 If the Compressed Size does not match the real size of the
454 Compressed Data field, the decoder must indicate an error.
456 Having the Compressed Size field in the Block Header can be
457 useful for multithreaded decoding when seeking is not possible.
458 If the Blocks are small enough, the decoder can read multiple
459 Blocks into its internal buffer, and decode the Blocks in
462 Compressed Size can also be useful when seeking forwards to
463 a specific location in streamed mode: the decoder can quickly
464 skip over irrelevant Blocks, without decoding them.
467 3.1.3. Uncompressed Size
469 This field is present only if the appropriate bit is set in
470 the Block Flags field (see Section 3.1.1).
472 The Uncompressed Size field contains the size of the Block
475 Storing Uncompressed Size serves several purposes:
476 - The decoder will know when all of the data has been
477 decoded without an explicit End of Payload Marker.
478 - The decoder knows how much memory it needs to allocate
479 for a temporary buffer in multithreaded mode.
480 - Simple error detection: wrong size indicates a broken file.
481 - Sometimes it is useful to know the file size without
482 uncompressing the file.
484 It should be noted that the only reliable way to find out what
485 the real uncompressed size is is to uncompress the Block,
486 because the Block Header and Metadata Block fields may contain
487 (intentionally or unintentionally) invalid information.
489 Uncompressed Size is stored using the encoding described in
490 Section 1.2. If the Uncompressed Size does not match the
491 real uncompressed size, the decoder must indicate an error.
494 3.1.4. List of Filter Flags
496 +================+================+ +================+
497 | Filter 0 Flags | Filter 1 Flags | ... | Filter n Flags |
498 +================+================+ +================+
500 The number of Filter Flags fields is stored in the Block Flags
501 field (see Section 3.1.1). As a special case, if the number of
502 Filter Flags fields is zero, it is equivalent to having the
503 Copy filter as the only filter.
505 The format of each Filter Flags field is as follows:
507 +------+=============+=============================+
508 | Misc | External ID | External Size of Properties |
509 +------+=============+=============================+
511 +===================+
512 ---> | Filter Properties |
513 +===================+
515 The list of officially defined Filter IDs and the formats of
516 their Filter Properties are described in Section 4.3.
521 To save space, the most commonly used Filter IDs and the
522 Size of Filter Properties are encoded in a single byte.
523 Depending on the contents of the Misc field, Filter ID is
524 the value of the Misc or External ID field.
526 Value Filter ID Size of Filter Properties
527 0x00 - 0x1F Misc 0 bytes
528 0x20 - 0x3F Misc 1 byte
529 0x40 - 0x5F Misc 2 bytes
530 0x60 - 0x7F Misc 3 bytes
531 0x80 - 0x9F Misc 4 bytes
532 0xA0 - 0xBF Misc 5 bytes
533 0xC0 - 0xDF Misc 6 bytes
534 0xE0 - 0xFE External ID 0-30 bytes
535 0xFF External ID External Size of Properties
537 The following code demonstrates parsing the Misc field and,
538 when needed, the External ID and External Size of Properties
542 uint64_t properties_size;
543 uint8_t misc = read_byte();
546 id = read_variable_length_integer();
549 properties_size = read_variable_length_integer();
551 properties_size = misc - 0xE0;
555 properties_size = misc / 0x20;
561 This field is present only if the Misc field contains a value
562 that indicates usage of External ID. The External ID is stored
563 using the encoding described in Section 1.2.
566 3.1.4.3. External Size of Properties
568 This field is present only if the Misc field contains a value
569 that indicates usage of External Size of Properties. The size
570 of Filter Properties is stored using the encoding described in
574 3.1.4.4. Filter Properties
576 Size of this field depends on the Misc field (Section 3.1.4.1)
577 and, if present, External Size of Properties field (Section
578 3.1.4.3). The format of this field is depends on the selected
579 filter; see Section 4.3 for details.
584 This field is present only if the appropriate bit is set in
585 the Stream Flags field (see Section 2.2.2).
587 The CRC32 is calculated over everything in the Block Header
588 field except the Header Padding field and the CRC32 field
589 itself. It is stored as an unsigned 32-bit little endian
590 integer. If the calculated value does not match the stored
591 one, the decoder must indicate an error.
594 3.1.6. Header Padding
596 This field contains as many nul bytes as indicated by the value
597 stored in the Header Flags field. If the Header Padding field
598 contains any non-nul bytes, the decoder must indicate an error.
600 The intent of the Header Padding field is to allow alignment
601 of Compressed Data. The usefulness of alignment is described
607 The format of Compressed Data depends on Block Flags and List
608 of Filter Flags. Excluding the descriptions of the simplest
609 filters in Section 4, the format of the filter-specific encoded
610 data is out of scope of this document.
612 Note a special case: if End of Payload Marker (see Section
613 3.1.1) is not used and Uncompressed Size is zero, the size
614 of the Compressed Data field is always zero.
619 +=======+===============+================+
620 | Check | Stream Footer | Footer Padding |
621 +=======+===============+================+
626 The type and size of the Check field depends on which bits
627 are set in the Stream Flags field (see Section 2.2.2).
629 The Check, when used, is calculated from the original
630 uncompressed data. If the calculated Check does not match the
631 stored one, the decoder must indicate an error. If the selected
632 type of Check is not supported by the decoder, it must indicate
638 +===================+===============+--------------+
639 | Uncompressed Size | Backward Size | Stream Flags |
640 +===================+===============+--------------+
642 +----------+---------+
643 ---> | Footer Magic Bytes |
644 +----------+---------+
646 Stream Footer is present only in
647 - Data Block of a Single-Block Stream; and
648 - Footer Metadata Block of a Multi-Block Stream.
650 The Stream Footer field is placed inside Block Footer, because
651 no padding is allowed between Check and Stream Footer.
654 3.3.2.1. Uncompressed Size
656 This field is present only in the Data Block of a Single-Block
657 Stream if Uncompressed Size is not stored to the Block Header
658 (see Section 3.1.1). Without the Uncompressed Size field in
659 Stream Footer it would not be possible to quickly find out
660 the Uncompressed Size of the Stream in all cases.
662 Uncompressed Size is stored using the encoding described in
663 Section 1.2. If the stored value does not match the real
664 uncompressed size of the Single-Block Stream, the decoder must
668 3.3.2.2. Backward Size
670 This field contains the total size of the Block Header,
671 Compressed Data, Check, and Uncompressed Size fields. The
672 value is stored using the encoding described in Section 1.2.
673 If the Backward Size does not match the real total size of
674 the appropriate fields, the decoder must indicate an error.
676 Implementations reading the Stream backwards should notice
677 that the value in this field can never be zero.
680 3.3.2.3. Stream Flags
682 This is a copy of the Stream Flags field from the Stream
683 Header. The information stored to Stream Flags is needed
684 when parsing the Stream backwards.
687 3.3.2.4. Footer Magic Bytes
689 As the last step of the decoding process, the decoder must
690 verify the existence of Footer Magic Bytes. If they are not
691 found, an error must be indicated.
693 Using a C array and ASCII:
694 const uint8_t FOOTER_MAGIC[2] = { 'Y', 'Z' };
699 The primary reason to have Footer Magic Bytes is to make
700 it easier to detect incomplete files quickly, without
701 uncompressing. If the file does not end with Footer Magic Bytes
702 (excluding Footer Padding described in Section 3.3.3), it
703 cannot be undamaged, unless someone has intentionally appended
704 garbage after the end of the Stream. (Appending garbage at the
705 end of the file does not prevent uncompressing the file, but
706 may give a warning or error depending on the decoder
710 3.3.3. Footer Padding
712 In certain situations it is convenient to be able to pad
713 Blocks or Streams to be multiples of, for example, 512 bytes.
714 Footer Padding makes this possible. Note that this is in no
715 way required to enforce alignment in the way described in
716 Section 4.3; the Header Padding field is enough for that.
718 When Footer Padding is used, it must contain only nul bytes.
719 Any non-nul byte should be considered as the beginning of
720 a new Block or Stream.
722 The possibility of Padding should be taken into account when
723 designing an application that wants to find out information
724 about a Stream by parsing Footer Metadata Block.
726 Support for Padding was inspired by a related note in
732 The Block Flags field defines how many filters are used. When
733 more than one filter is used, the filters are chained; that is,
734 the output of one filter is the input of another filter. The
735 following figure illustrates the direction of data flow.
737 v Uncompressed Data ^
739 Encoder | Filter 1 | Decoder
744 The filters are independent from each other, except that they
745 must cooperate a little to make it possible, in all cases, to
746 detect when all of the data has been decoded. In addition, the
747 filters should cooperate in the encoder to keep the alignment
751 4.1. Detecting when All Data Has Been Decoded
753 There must be a way for the decoder to detect when all of the
754 Compressed Data has been decoded. This is simple when only
755 one filter is used, but a bit more complex when multiple
758 This file format supports three methods to detect when all of
759 the data has been decoded:
762 - End of Payload Marker
764 In both encoder and decoder, filters are initialized starting
765 from the first filter in the chain. For each filter, one of
766 these three methods is used.
769 4.1.1. With Uncompressed Size
771 This method is the only method supported by all filters.
772 It must be used when uncompressed size is known by the
773 filter-specific encoder or decoder. In practice this means
774 that Uncompressed Size has been stored to the Block Header.
776 In case of the first filter in the chain, the uncompressed size
777 given to the filter-specific encoder or decoder equals the
778 Uncompressed Size stored in the Block Header. For the rest of
779 the filters in the chain, uncompressed size is the size of the
780 output data of the previous filter in the chain.
782 Note that when Use End of Payload Marker bit is set in Block
783 Flags, Uncompressed Size is considered to be unknown even if
784 it was present in the Block Header. Thus, if End of Payload
785 Marker is used, uncompressed size of all of the filters in
786 the chain is unknown, and can never be used to detect when
787 all of the data has been decoded.
789 Once the correct number of bytes has been written out, the
790 filter-specific decoder indicates to its caller that all of
791 the data has been decoded. If the filter-specific decoder
792 detects End of Input or End of Payload Marker before the
793 correct number of bytes is decoded, the decoder must indicate
797 4.1.2. With End of Input
799 Most filters will know that all of the data has been decoded
800 when the End of Input data has been reached. Once the filter
801 knows that it has received the input data in its entirety,
802 it finishes its job, and indicates to its caller that all of
803 the data has been decoded. The filter-specific decoder must
804 indicate an error if it detects End of Payload Marker.
806 Note that this method can work only when the filter is not
807 the last filter in the chain, because only another filter
808 can indicate the End of Input data. In practice this means,
809 that a filter later in the chain must support embedding
810 End of Payload Marker.
812 When a filter that cannot embed End of Payload Marker is the
813 last filter in the chain, Subblock filter is appended to the
814 chain as an implicit filter. In the simplest case, this occurs
815 when no filters are specified, and the End of Payload Marker
816 bit is set in Block Flags.
819 4.1.3. With End of Payload Marker
821 End of Payload Marker is a filter-specific bit sequence that
822 indicates the end of data. It is supported by only a few
823 filters. It is used when uncompressed size is unknown, and
825 - doesn't support End of Input; or
826 - is the last filter in the chain.
828 End of Payload Marker is embedded at the end of the encoded
829 data by the filter-specific encoder. When the filter-specific
830 decoder detects the embedded End of Payload Marker, the decoder
831 knows that all of the data has been decoded. Then it finishes
832 its job, and indicates to its caller that all of the data has
833 been decoded. If the filter-specific decoder detects End of
834 Input before End of Payload Marker, the decoder must indicate
837 If the filter supports both End of Input and End of Payload
838 Marker, the former is used, unless the filter is the last
844 Some filters give better compression ratio or are faster
845 when the input or output data is aligned. For optimal results,
846 the encoder should try to enforce proper alignment when
847 possible. Not enforcing alignment in the encoder is not
848 an error. Thus, the decoder must be able to handle files with
849 suboptimal alignment.
851 Alignment of uncompressed input data is usually the job of
852 the application producing the data. For example, to get the
853 best results, an archiver tool should make sure that all
854 PowerPC executable files in the archive stream start at
855 offsets that are multiples of four bytes.
857 Some filters, for example LZMA, can be configured to take
858 advantage of specified alignment of input data. Note that
859 taking advantage of aligned input can be benefical also when
860 a filter is not the first filter in the chain. For example,
861 if you compress PowerPC executables, you may want to use the
862 PowerPC filter and chain that with the LZMA filter. Because not
863 only the input but also the output alignment of the PowerPC
864 filter is four bytes, it is now benefical to set LZMA settings
865 so that the LZMA encoder can take advantage of its
866 four-byte-aligned input data.
868 The output of the last filter in the chain is stored to the
869 Compressed Data field. Aligning Compressed Data appropriately
871 - speed, if the filtered data is handled multiple bytes at
872 a time by the filter-specific encoder and decoder,
873 because accessing aligned data in computer memory is
875 - compression ratio, if the output data is later compressed
876 with an external compression tool.
878 Compressed Data in a Stream can be aligned by using the Header
879 Padding field in the Block Header.
886 This is a dummy filter that simply copies all data from input
887 to output unmodified.
890 Size of Filter Properties: 0 bytes
891 Changes size of data: No
893 Detecting when all of the data has been decoded:
894 Uncompressed size: Yes
895 End of Payload Marker: No
905 The Subblock filter can be used to
906 - embed End of Payload Marker when the otherwise last
907 filter in the chain does not support embedding it; and
908 - apply additional filters in the middle of a Block.
911 Size of Filter Properties: 0 bytes
912 Changes size of data: Yes, unpredictably
914 Detecting when all of the data has been decoded:
915 Uncompressed size: Yes
916 End of Payload Marker: Yes
921 Output data: Freely adjustable
924 4.3.2.1. Format of the Encoded Output
926 The encoded data from the Subblock filter consist of zero or
929 +==========+==========+
930 | Subblock | Subblock | ...
931 +==========+==========+
933 Each Subblock contains two fields:
935 +----------------+===============+
936 | Subblock Flags | Subblock Data |
937 +----------------+===============+
939 Subblock Flags is a bitfield:
941 Bits Mask Description
942 0-3 0x0F The interpretation of these bits depend on
944 - 0x20 Bits 0-3 for Size
945 - 0x30 Bits 0-3 for Repeat Count
946 - Other These bits must be zero.
947 4-7 0xF0 Subblock Type:
949 - 0x10: End of Payload Marker
951 - 0x30: Repeating Data
952 - 0x40: Set Subfilter
953 - 0x50: Unset Subfilter
954 If some other value is detected, the decoder
955 must indicate an error.
957 The format of the Subblock Data field depends on Subblock Type.
959 Subblocks with the Subblock Type 0x00 (Padding) don't have a
960 Subblock Data field. These Subblocks can be useful for fixing
961 alignment. There can be at maximum of 31 consecutive Subblocks
962 with this Subblock Type; if there are more, the decoder must
965 Subblock with the Subblock Type 0x10 (End of Payload Marker)
966 doesn't have a Subblock Data field. The decoder must indicate
967 an error if this Subblock Type is detected when Subfilter is
968 enabled, or when the Subblock filter is not supposed to embed
969 the End of Payload Marker.
971 Subblocks with the Subblock Type 0x20 (Data) contain the rest
972 of the Size, which is followed by Size + 1 bytes in the Data
973 field (that is, Data can never be empty):
975 +------+------+------+======+
976 | Bits 4-27 for Size | Data |
977 +------+------+------+======+
979 Subblocks with the Subblock Type 0x30 (Repeating Data) contain
980 the rest of the Repeat Count, the Size of the Data, and finally
981 the actual Data to be repeated:
983 +---------+---------+--------+------+======+
984 | Bits 4-27 for Repeat Count | Size | Data |
985 +---------+---------+--------+------+======+
987 The size of the Data field is Size + 1. It is repeated Repeat
988 Count + 1 times. That is, the minimum size of Data is one byte;
989 the maximum size of Data is 256 bytes. The minimum number of
990 repeats is one; the maximum number of repeats is 2^28.
992 If Subfilter is not used, the Data field of Subblock Types 0x20
993 and 0x30 is the output of the decoded Subblock filter. If
994 Subfilter is used, Data is the input of the Subfilter, and the
995 decoded output of the Subfilter is the decoded output of the
998 Subblocks with the Subblock Type 0x40 (Set Subfilter) contain
999 a Filter Flags field in Subblock Data:
1005 It is an error to set the Subfilter to Filter ID 0x00 (Copy)
1006 or 0x01 (Subblock). All the other Filter IDs are allowed.
1007 The decoder must indicate an error if this Subblock Type is
1008 detected when a Subfilter is already enabled.
1010 Subblocks with the Subblock Type 0x50 (Unset Subfilter) don't
1011 have a Subblock Data field. There must be at least one Subblock
1012 with Subblock Type 0x20 or 0x30 between Subblocks with Subblock
1013 Type 0x40 and 0x50; if there isn't, the decoder must indicate
1016 Subblock Types 0x40 and 0x50 are always used as a pair: If the
1017 Subblock filter has been enabled with Subblock Type 0x40, it
1018 must always be disabled later with Subblock Type 0x50.
1019 Disabling must be done even if the Subfilter used End of
1020 Payload Marker; after the Subfilter has detected End of Payload
1021 Marker, the next Subblock that is not Padding must unset the
1024 When the Subblock filter is used as an implicit filter to embed
1025 End of Payload marker, the Subblock Types 0x40 and 0x50 (Set or
1026 Unset Subfilter) must not be used. The decoder must indicate an
1027 error if it detects any of these Subblock Types in an implicit
1030 The following code illustrates the basic structure of a
1033 uint32_t consecutive_padding = 0;
1034 bool got_output_with_subfilter = false;
1039 uint8_t flags = read_byte();
1042 consecutive_padding = 0;
1044 switch (flags >> 4) {
1049 if (++consecutive_padding == 32)
1054 // End of Payload Marker
1057 if (subfilter_enabled || !allow_eopm)
1063 size = flags & 0x0F;
1064 for (size_t i = 4; i < 28; i += 8)
1065 size |= (uint32_t)(read_byte()) << i;
1067 // If any output is produced, this will
1068 // set got_output_with_subfilter to true.
1074 repeat = flags & 0x0F;
1075 for (size_t i = 4; i < 28; i += 8)
1076 repeat |= (uint32_t)(read_byte()) << i;
1079 // If any output is produced, this will
1080 // set got_output_with_subfilter to true.
1081 copy_repeating_data(size, repeat);
1088 if (subfilter_enabled)
1090 got_output_with_subfilter = false;
1098 if (!subfilter_enabled)
1100 if (!got_output_with_subfilter)
1113 The Delta filter may increase compression ratio when the value
1114 of the next byte correlates with the value of an earlier byte
1115 at specified distance.
1118 Size of Filter Properties: 1 byte
1119 Changes size of data: No
1121 Detecting when all of the data has been decoded:
1122 Uncompressed size: Yes
1123 End of Payload Marker: No
1126 Preferred alignment:
1128 Output data: Same as the original input data
1130 The Properties byte indicates the delta distance, which can be
1131 1-256 bytes backwards from the current byte: 0x00 indicates
1132 distance of 1 byte and 0xFF distance of 256 bytes.
1135 4.3.3.1. Format of the Encoded Output
1137 The code below illustrates both encoding and decoding with
1140 // Distance is in the range [1, 256].
1141 const unsigned int distance = get_properties_byte() + 1;
1145 memset(delta, 0, sizeof(delta));
1148 const int byte = read_byte();
1152 uint8_t tmp = delta[(uint8_t)(distance + pos)];
1154 tmp = (uint8_t)(byte) - tmp;
1155 delta[pos] = (uint8_t)(byte);
1157 tmp = (uint8_t)(byte) + tmp;
1168 LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
1169 compression algorithm with high compression ratio and fast
1170 decompression. LZMA based on LZ77 and range coding algorithms.
1173 Size of Filter Properties: 2 bytes
1174 Changes size of data: Yes, unpredictably
1176 Detecting when all of the data has been decoded:
1177 Uncompressed size: Yes
1178 End of Payload Marker: Yes
1181 Preferred alignment:
1182 Input data: Adjustable to 1/2/4/8/16 byte(s)
1185 At the time of writing, there is no other documentation about
1186 how LZMA works than the source code in LZMA SDK. Once such
1187 documentation gets written, it will probably be published as
1188 a separate document, because including the documentation here
1189 would lengthen this document considerably.
1191 The format of the Filter Properties field is as follows:
1193 +-----------------+------------------+
1194 | LZMA Properties | Dictionary Flags |
1195 +-----------------+------------------+
1198 4.3.4.1. LZMA Properties
1200 The LZMA Properties field contains three properties. An
1201 abbreviation is given in parentheses, followed by the value
1202 range of the property. The field consists of
1204 1) the number of literal context bits (lc, [0, 8]);
1205 2) the number of literal position bits (lp, [0, 4]); and
1206 3) the number of position bits (pb, [0, 4]).
1208 They are encoded using the following formula:
1210 LZMA Properties = (pb * 5 + lp) * 9 + lc
1212 The following C code illustrates a straightforward way to
1213 decode the properties:
1216 uint8_t prop = get_lzma_properties() & 0xFF;
1217 if (prop > (4 * 5 + 4) * 9 + 8)
1218 return LZMA_PROPERTIES_ERROR;
1220 pb = prop / (9 * 5);
1226 4.3.4.2. Dictionary Flags
1228 Currently the lowest six bits of the Dictionary Flags field
1231 Bits Mask Description
1232 0-5 0x3F Dictionary Size
1233 6-7 0xC0 Reserved for future use; must be zero for now.
1235 Dictionary Size is encoded with one-bit mantissa and five-bit
1236 exponent. To avoid wasting space, one-byte dictionary has its
1239 Raw value Mantissa Exponent Dictionary size
1255 (*) The real maximum size of the dictionary is one byte
1256 less than 4 GiB, because the distance of 4 GiB is
1257 reserved for End of Payload Marker.
1259 Instead of having a table in the decoder, the dictionary size
1260 can be decoded using the following C code:
1262 uint64_t dictionary_size;
1263 const uint8_t bits = get_dictionary_flags() & 0x3F;
1265 dictionary_size = 1;
1267 dictionary_size = 2 | ((bits + 1) & 1);
1268 dictionary_size = dictionary_size << ((bits - 1) / 2);
1272 4.3.5. Branch/Call/Jump Filters for Executables
1274 These filters convert relative branch, call, and jump
1275 instructions to their absolute counterparts in executable
1276 files. This conversion increases redundancy and thus
1279 Size of Filter Properties: 0 or 4 bytes
1280 Changes size of data: No
1282 Detecting when all of the data has been decoded:
1283 Uncompressed size: Yes
1284 End of Payload Marker: No
1287 Below is the list of filters in this category. The alignment
1288 is the same for both input and output data.
1290 Filter ID Alignment Description
1291 0x04 1 byte x86 filter (BCJ)
1292 0x05 4 bytes PowerPC (big endian) filter
1293 0x06 16 bytes IA64 filter
1294 0x07 4 bytes ARM (little endian) filter
1295 0x08 2 bytes ARM Thumb (little endian) filter
1296 0x09 4 bytes SPARC filter
1298 If the size of Filter Properties is four bytes, the Filter
1299 Properties field contains the start offset used for address
1300 conversions. It is stored as an unsigned 32-bit little endian
1301 integer. If the size of Filter Properties is zero, the start
1304 Setting the start offset may be useful if an executable has
1305 multiple sections, and there are many cross-section calls.
1306 Taking advantage of this feature usually requires usage of
1307 the Subblock filter.
1312 Metadata is stored in Metadata Blocks, which can be in the
1313 beginning or at the end of a Multi-Block Stream. Because of
1314 Blocks, it is possible to compress Metadata in the same way
1315 as the actual data is compressed. This Section describes the
1316 format of the data stored in Metadata Blocks.
1318 +----------------+===============================+
1319 | Metadata Flags | Size of Header Metadata Block |
1320 +----------------+===============================+
1322 +============+===================+=======+=======+
1323 ---> | Total Size | Uncompressed Size | Index | Extra |
1324 +============+===================+=======+=======+
1326 Stream must be parseable backwards. That is, there must be
1327 a way to locate the beginning of the Stream by starting from
1328 the end of the Stream. Thus, the Footer Metadata Block must
1329 contain the Total Size field or the Index field. If the Stream
1330 has Header Metadata Block, also the Size of Header Metadata
1331 Block field must be present in Footer Metadata Block.
1333 It must be possible to quickly locate the Blocks in
1334 non-streamed mode. Thus, the Index field must be present
1335 at least in one Metadata Block.
1337 If the above conditions are not met, the decoder must indicate
1340 There should be no additional data after the last field. If
1341 there is, the the decoder should indicate an error.
1346 This field describes which fields are present in a Metadata
1349 Bit(s) Mask Desription
1350 0 0x01 Size of Header Metadata Block is present.
1351 1 0x02 Total Size is present.
1352 2 0x04 Uncompressed Size is present.
1353 3 0x08 Index is present.
1354 4-6 0x70 Reserve for future use; must be zero for now.
1355 7 0x80 Extra is present.
1357 If any reserved bit is set, the decoder must indicate an error.
1358 It is possible that there is a new field present which the
1359 decoder is not aware of, and can thus parse the Metadata
1363 5.2. Size of Header Metadata Block
1365 This field is present only if the appropriate bit is set in
1366 the Metadata Flags field (see Section 5.1).
1368 Size of Header Metadata Block is needed to make it possible to
1369 parse the Stream backwards. The size is stored using the
1370 encoding described in Section 1.2. The decoder must verify that
1371 that the value stored in this field is non-zero. In Footer
1372 Metadata Block, the decoder must also verify that the stored
1373 size matches the real size of Header Metadata Block. In the
1374 Header Meatadata Block, the value of this field is ignored as
1375 long as it is not zero.
1380 This field is present only if the appropriate bit is set in the
1381 Metadata Flags field (see Section 5.1).
1383 This field contains the total size of the Data Blocks in the
1384 Stream. Total Size is stored using the encoding described in
1385 Section 1.2. If the stored value does not match the real total
1386 size of the Data Blocks, the decoder must indicate an error.
1387 The value of this field must be non-zero.
1389 Total Size can be used to quickly locate the beginning or end
1390 of the Stream. This can be useful for example when doing
1391 random-access reading, and the Index field is not in the
1392 Metadata Block currently being read.
1394 It is useless to have both Total Size and Index in the same
1395 Metadata Block, because Total Size can be calculated from the
1399 5.4. Uncompressed Size
1401 This field is present only if the appropriate bit is set in the
1402 Metadata Flags field (see Section 5.1).
1404 This field contains the total uncompressed size of the Data
1405 Blocks in the Stream. Uncompresssed Size is stored using the
1406 encoding described in Section 1.2. If the stored value does not
1407 match the real uncompressed size of the Data Blocks, the
1408 decoder must indicate an error.
1410 It is useless to have both Uncompressed Size and Index in
1411 the same Metadata Block, because Uncompressed Size can be
1412 calculated from the Index field.
1417 +=======================+=============+====================+
1418 | Number of Data Blocks | Total Sizes | Uncompressed Sizes |
1419 +=======================+=============+====================+
1421 Index serves several purporses. Using it, one can
1422 - verify that all Blocks in a Stream have been processed;
1423 - find out the Uncompressed Size of a Stream; and
1424 - quickly access the beginning of any Block (random access).
1427 5.5.1. Number of Data Blocks
1429 This field contains the number of Data Blocks in the Stream.
1430 The value is stored using the encoding described in Section
1431 1.2. If the decoder has decoded all the Data Blocks of the
1432 Stream, and then notices that the Number of Records doesn't
1433 match the real number of Data Blocks, the decoder must
1434 indicate an error. The value of this field must be non-zero.
1439 +============+============+
1440 | Total Size | Total Size | ...
1441 +============+============+
1443 This field lists the Total Sizes of every Data Block in the
1444 Stream. There are as many Total Size fields as indicated by
1445 the Number of Data Blocks field.
1447 Total Size is the size of Block Header, Compressed Data, and
1448 Block Footer. It is stored using the encoding described in
1449 Section 1.2. If the Total Sizes do not match the real sizes
1450 of respective Blocks, the decoder should indicate an error.
1451 All the Total Size fields must have a non-zero value.
1454 5.5.3. Uncompressed Sizes
1456 +===================+===================+
1457 | Uncompressed Size | Uncompressed Size | ...
1458 +===================+===================+
1460 This field lists the Uncompressed Sizes of every Data Block
1461 in the Stream. There are as many Uncompressed Size fields as
1462 indicated by the Number of Records field.
1464 Uncompressed Sizes are stored using the encoding described
1465 in Section 1.2. If the Uncompressed Sizes do not match the
1466 real sizes of respective Blocks, the decoder shoud indicate
1472 This field is present only if the appropriate bit is set in the
1473 Metadata Flags field (see Section 5.1). Note that the bit does
1474 not indicate that there is any data in the Extra field; it only
1475 indicates that Extra may be non-empty.
1477 The Extra field contains only information that is not required
1478 to properly uncompress the Stream or to do random-access
1479 reading. Supporting the Extra field is optional. In case the
1480 decoder doesn't support the Extra field, it should silently
1483 Extra consists of zero or more Records:
1486 | Record | Record | ...
1489 Excluding Records with Record ID 0x00, each Record contains
1492 +==========+==============+======+
1493 | Reord ID | Size of Data | Data |
1494 +==========+==============+======+
1496 The Record ID and Size of Data are stored using the encoding
1497 described in Section 1.2. Data can be binary or UTF-8
1498 [RFC-3629] strings. Non-UTF-8 strings should be avoided.
1499 Because the Size of Data is known, there is no need to
1500 terminate strings with a nul byte, although doing so should
1501 not be considered an error.
1503 The Record IDs are divided in two categories:
1504 - Safe-to-Copy Records may be preserved as is when the
1505 Stream is modified in ways that don't change the actual
1506 uncompressed data. Examples of such operatings include
1507 recompressing and adding, modifying, or deleting unrelated
1509 - Unsafe-to-Copy Records should be removed (and possibly
1510 recreated) when any kind of changes are made to the Stream.
1512 When the actual uncompressed data is modified, all Records
1513 should be removed (and possibly recreated), unless the
1514 application knows that the Data stored to the Record(s) is
1517 The following subsections describe the standard Record IDs and
1518 the format of their Data fields. Safe-to-Copy Records have an
1519 odd ID, while Unsafe-to-Copy Records have an even ID.
1522 5.6.1. 0x00: Dummy/Padding
1524 This Record is special, because it doesn't have the Size of
1525 Data or Data fields.
1527 Dummy Records can be used, for example, to fill Metadata Block
1528 when a few bytes of extra space has been reserved for it. There
1529 can be any number of Dummy Records.
1532 5.6.2. 0x01: OpenPGP Signature
1534 OpenPGP signature is computed from uncompressed data. The
1535 signature can be used to verify that the contents of a Stream
1536 has been created by a trustworthy source.
1538 If the decoder supports decoding concatenated Streams, it
1539 must indicate an error when verifying OpenPGP signatures if
1540 there is more than one Stream.
1542 OpenPGP format is documented in [RFC-2440].
1545 5.6.3. 0x02: Filter Information
1547 The Filter Information Record contains information about the
1548 filters used in the Stream. This field can be used to quickly
1549 - display which filters are used in each Block;
1550 - check if all the required filters are supported by the
1551 current decoder version; and
1552 - check how much memory is required to decode each Block.
1554 The format of the Filter Information field is as follows:
1556 +=================+=================+
1557 | Block 0 Filters | Block 1 Filters | ...
1558 +=================+=================+
1560 There can be at maximum of as many Block Filters fields as
1561 there are Data Blocks in the Stream. The format of the Block
1562 Filters field is as follows:
1564 +------------------+======================+============+
1565 | Block Properties | List of Filter Flags | Subfilters |
1566 +------------------+======================+============+
1568 Block Properties is a bitfield:
1570 Bit(s) Mask Description
1571 0-2 0x07 Number of filters (0-7)
1572 3 0x08 End of Payload Marker is used.
1573 4 0x10 The Subfilters field is present.
1574 5-7 0xE0 Reserved for future use; must be zero for now.
1576 The contents of the List of Filter Flags field must match the
1577 List of Filter Flags field in the respective Block Header.
1579 The Subfilters field may be present only if the List of Filter
1580 Flags contains a Filter Flags field for a Subblock filter. The
1581 format of the Subfilters field is as follows:
1583 +======================+=========================+
1584 | Number of Subfilters | List of Subfilter Flags |
1585 +======================+=========================+
1587 The value stored in the Number of Subfilters field is stored
1588 using the encoding described in Section 1.2. The List of
1589 Subfilter Flags field contains as many Filter Flags fields
1590 as indicated by the Number of Subfilters field. These Filter
1591 Flags fields list some or all the Subfilters used via the
1592 Subblock filter. The order of the listed Subfilters is not
1595 Decoders supporting this Record should indicate a warning or
1596 error if this Record contains Filter Flags that are not
1597 actually used by the respective Blocks.
1600 5.6.4. 0x03: Comment
1602 Free-form comment is stored in UTF-8 [RFC-3629] encoding.
1604 The beginning of a new line should be indicated using the
1605 ASCII Line Feed character (0x0A). When the Line Feed character
1606 is not the native way to indicate new line in the underlying
1607 operating system, the encoder and decoder should convert the
1608 newline characters to and from Line Feeds.
1611 5.6.5. 0x04: List of Checks
1614 | Check | Check | ...
1617 There are as many Check fields as there are Blocks in the
1618 Stream. The size of Check fields depend on Stream Flags
1619 (see Section 2.2.2).
1621 Decoders supporting this Record should indicate a warning or
1622 error if the Checks don't match the respective Blocks.
1625 5.6.6. 0x05: Original Filename
1627 Original filename is stored in UTF-8 [RFC-3629] encoding.
1629 The filename must not include any path, only the filename
1630 itself. Special care must be taken to prevent directory
1631 traversal vulnerabilities.
1633 When files are moved between different operating systems, it
1634 is possible that filename valid in the source system is not
1635 valid in the target system. It is implementation defined how
1636 the decoder handles this kind of situations.
1639 5.6.7. 0x07: Modification Time
1641 Modification time is stored as POSIX time, as an unsigned
1642 little endian integer. The number of bits depends on the
1643 Size of Data field. Note that the usage of unsigned integer
1644 limits the earliest representable time to 1970-01-01T00:00:00.
1647 5.6.8. 0x09: High-Resolution Modification Time
1649 This Record extends the `0x04: Modification time' Record with
1650 a subsecond time information. There are two supported formats
1651 of this field, which can be distinguished by looking at the
1655 3 [0; 9,999,999] times 100 nanoseconds
1656 4 [0; 999,999,999] nanoseconds
1658 The value is stored as an unsigned 24-bit or 32-bit little
1662 5.6.9. 0x0B: MIME Type
1664 MIME type of the uncompressed Stream. This can be used to
1665 detect the content type. [IANA-MIME]
1668 5.6.10. 0x0D: Homepage URL
1670 This field can be used, for example, when distributing software
1671 packages (sources or binaries). The field would indicate the
1672 homepage of the program.
1674 For details on how to encode URLs, see [RFC-1738].
1677 6. Custom Filter and Extra Record IDs
1679 If a developer wants to use custom Filter or Extra Record IDs,
1680 he has two choices. The first choice is to contact Lasse Collin
1681 and ask him to allocate a range of IDs for the developer.
1683 The second choice is to generate a 40-bit random integer,
1684 which the developer can use as his personal Developer ID.
1685 To minimalize the risk of collisions, Developer ID has to be
1686 a randomly generated integer, not manually selected "hex word".
1687 The following command, which works on many free operating
1688 systems, can be used to generate Developer ID:
1690 dd if=/dev/urandom bs=5 count=1 | hexdump
1692 The developer can then use his Developer ID to create unique
1693 (well, hopefully unique) Filter and Extra Record IDs.
1695 Bits Mask Description
1696 0-15 0x0000_0000_0000_FFFF Filter or Extra Record ID
1697 16-55 0x00FF_FFFF_FFFF_0000 Developer ID
1698 56-62 0x7F00_0000_0000_0000 Static prefix: 0x7F
1700 The resulting 63-bit integer will use 9 bytes of space when
1701 stored using the encoding described in Section 1.2. To get
1702 a shorter ID, see the beginning of this Section how to
1703 request a custom ID range.
1705 Note that Filter and Metadata Record IDs are in their own
1706 namespaces. That is, you can use the same ID value as Filter ID
1707 and Metadata Record ID, and the meanings of the IDs do not need
1708 to be related to each other.
1711 6.1. Reserved Custom Filter ID Ranges
1714 0x0000_0000 - 0x0000_00DF IDs fitting into the Misc field
1715 0x0002_0000 - 0x0007_FFFF Reserved to ease .7z compatibility
1716 0x0200_0000 - 0x07FF_FFFF Reserved to ease .7z compatibility
1719 7. Cyclic Redundancy Checks
1721 There are several incompatible variations to calculate CRC32
1722 and CRC64. For simplicity and clarity, complete examples are
1723 provided to calculate the checks as they are used in this file
1724 format. Implementations may use different code as long as it
1725 gives identical results.
1727 The program below reads data from standard input, calculates
1728 the CRC32 and CRC64 values, and prints the calculated values
1729 as big endian hexadecimal strings to standard output.
1731 #include <sys/types.h>
1732 #include <inttypes.h>
1735 uint32_t crc32_table[256];
1736 uint64_t crc64_table[256];
1741 static const uint32_t poly32 = UINT32_C(0xEDB88320);
1742 static const uint64_t poly64
1743 = UINT64_C(0xC96C5795D7870F42);
1745 for (size_t i = 0; i < 256; ++i) {
1749 for (size_t j = 0; j < 8; ++j) {
1751 crc32 = (crc32 >> 1) ^ poly32;
1756 crc64 = (crc64 >> 1) ^ poly64;
1761 crc32_table[i] = crc32;
1762 crc64_table[i] = crc64;
1767 crc32(const uint8_t *buf, size_t size, uint32_t crc)
1770 for (size_t i = 0; i < size; ++i)
1771 crc = crc32_table[buf[i] ^ (crc & 0xFF)]
1777 crc64(const uint8_t *buf, size_t size, uint64_t crc)
1780 for (size_t i = 0; i < size; ++i)
1781 crc = crc64_table[buf[i] ^ (crc & 0xFF)]
1791 uint32_t value32 = 0;
1792 uint64_t value64 = 0;
1793 uint64_t total_size = 0;
1797 const size_t buf_size = fread(buf, 1, 8192, stdin);
1801 total_size += buf_size;
1802 value32 = crc32(buf, buf_size, value32);
1803 value64 = crc64(buf, buf_size, value64);
1806 printf("Bytes: %" PRIu64 "\n", total_size);
1807 printf("CRC-32: 0x%08" PRIX32 "\n", value32);
1808 printf("CRC-64: 0x%016" PRIX64 "\n", value64);
1816 8.1. Normative References
1819 Uniform Resource Locators (URL)
1820 http://www.ietf.org/rfc/rfc1738.txt
1823 Key words for use in RFCs to Indicate Requirement Levels
1824 http://www.ietf.org/rfc/rfc2119.txt
1827 OpenPGP Message Format
1828 http://www.ietf.org/rfc/rfc2440.txt
1831 UTF-8, a transformation format of ISO 10646
1832 http://www.ietf.org/rfc/rfc3629.txt
1836 http://www.iana.org/assignments/media-types/
1839 8.2. Informative References
1841 LZMA SDK - The original LZMA implementation
1842 http://7-zip.org/sdk.html
1844 LZMA Utils - LZMA adapted to POSIX-like systems
1845 http://tukaani.org/lzma/
1848 GZIP file format specification version 4.3
1849 http://www.ietf.org/rfc/rfc1952.txt
1850 - Notation of byte boxes in section `2.1. Overall conventions'
1853 GNU tar 1.16.1 manual
1854 http://www.gnu.org/software/tar/manual/html_node/Blocking-Factor.html
1855 - Node 9.4.2 `Blocking Factor', paragraph that begins
1856 `gzip will complain about trailing garbage'
1857 - Note that this URL points to the latest version of the
1858 manual, and may some day not contain the note which is in
1859 1.16.1. For the exact version of the manual, download GNU
1860 tar 1.16.1: ftp://ftp.gnu.org/pub/gnu/tar/tar-1.16.1.tar.gz