9 1.1. Byte and Its Representation
10 1.2. Multibyte Integers
11 2. Overall Structure of .xz File
14 2.1.1.1. Header Magic Bytes
19 2.1.2.2. Backward Size
21 2.1.2.4. Footer Magic Bytes
25 3.1.1. Block Header Size
27 3.1.3. Compressed Size
28 3.1.4. Uncompressed Size
29 3.1.5. List of Filter Flags
36 4.2. Number of Records
39 4.3.2. Uncompressed Size
47 5.3.2. Branch/Call/Jump Filters for Executables
49 5.3.3.1. Format of the Encoded Output
50 5.4. Custom Filter IDs
51 5.4.1. Reserved Custom Filter ID Ranges
52 6. Cyclic Redundancy Checks
58 This document describes the .xz file format (filename suffix
59 `.xz', MIME type `application/x-xz'). It is intended that this
60 this format replace the old .lzma format used by LZMA SDK and
63 IMPORTANT: The version described in this document is a
64 draft, NOT a final, official version. Changes
68 0.1. Copyright Notices
70 Copyright (C) 2006-2008 Lasse Collin <lasse.collin@tukaani.org>
71 Copyright (C) 2006 Ville Koskinen <w-ber@iki.fi>
73 Copying and distribution of this file, with or without
74 modification, are permitted in any medium without royalty
75 provided the copyright notice and this notice are preserved.
76 Modified versions must be marked as such.
78 All source code examples given in this document are put into
79 the public domain by the authors of this document.
81 Special thanks for helping with this document goes to
82 Igor Pavlov. Thanks for helping with this document goes to
83 Mark Adler, H. Peter Anvin, and Mikko Pouru.
88 Last modified: 2008-09-24 21:05+0300
90 (A changelog will be kept once the first official version
96 The keywords `must', `must not', `required', `should',
97 `should not', `recommended', `may', and `optional' in this
98 document are to be interpreted as described in [RFC-2119].
99 These words are not capitalized in this document.
101 Indicating a warning means displaying a message, returning
102 appropriate exit status, or something else to let the user
103 know that something worth warning occurred. The operation
104 should still finish if a warning is indicated.
106 Indicating an error means displaying a message, returning
107 appropriate exit status, or something else to let the user
108 know that something prevented successfully finishing the
109 operation. The operation must be aborted once an error has
113 1.1. Byte and Its Representation
115 In this document, byte is always 8 bits.
117 A `nul byte' has all bits unset. That is, the value of a nul
120 To represent byte blocks, this document uses notation that
121 is similar to the notation used in [RFC-1952]:
128 | Foo | Two bytes; that is, some of the vertical bars
129 +---+---+ can be missing.
132 | Foo | Zero or more bytes.
135 In this document, a boxed byte or a byte sequence declared
136 using this notation is called `a field'. The example field
137 above would be called `the Foo field' or plain `Foo'.
140 1.2. Multibyte Integers
142 Multibyte integers of static length, such as CRC values,
143 are stored in little endian byte order (least significant
146 When smaller values are more likely than bigger values (for
147 example file sizes), multibyte integers are encoded in a
148 variable-length representation:
149 - Numbers in the range [0, 127] are copied as is, and take
151 - Bigger numbers will occupy two or more bytes. All but the
152 last byte of the multibyte representation have the highest
155 For now, the value of the variable-length integers is limited
156 to 63 bits, which limits the encoded size of the integer to
157 nine bytes. These limits may be increased in future if needed.
159 The following C code illustrates encoding and decoding of
160 variable-length integers. The functions return the number of
161 bytes occupied by the integer (1-9), or zero on error.
163 #include <sys/types.h>
164 #include <inttypes.h>
167 encode(uint8_t buf[static 9], uint64_t num)
169 if (num >= UINT64_MAX / 2)
174 while (num >= 0x80) {
175 buf[i++] = (uint8_t)(num) | 0x80;
179 buf[i++] = (uint8_t)(num);
185 decode(const uint8_t buf[], size_t size_max, uint64_t *num)
193 *num = buf[0] & 0x7F;
196 while (buf[i++] & 0x80) {
197 if (i > size_max || buf[i] == 0x00)
200 *num |= (uint64_t)(buf[i] & 0x7F) << (i * 7);
207 2. Overall Structure of .xz File
209 +========+================+========+================+
210 | Stream | Stream Padding | Stream | Stream Padding | ...
211 +========+================+========+================+
213 A file contains usually only one Stream. However, it is
214 possible to concatenate multiple Streams together with no
215 additional processing. It is up to the implementation to
216 decide if the decoder will continue decoding from the next
217 Stream once the end of the first Stream has been reached.
222 +-+-+-+-+-+-+-+-+-+-+-+-+=======+=======+ +=======+
223 | Stream Header | Block | Block | ... | Block |
224 +-+-+-+-+-+-+-+-+-+-+-+-+=======+=======+ +=======+
226 +=======+-+-+-+-+-+-+-+-+-+-+-+-+
227 ---> | Index | Stream Footer |
228 +=======+-+-+-+-+-+-+-+-+-+-+-+-+
230 All the above fields have a size that is a multiple of four. If
231 Stream is used as an internal part of another file format, it
232 is recommended to make the Stream start at an offset that is
233 a multiple of four bytes.
235 Stream Header, Index, and Stream Footer are always present in
236 a Stream. The maximum size of the Index field is 16 GiB (2^34).
238 There are zero or more Blocks. The maximum number of Blocks is
239 limited only by the maximum size of the Index field.
241 Total size of a Stream must be less than 8 EiB (2^63 bytes).
242 The same limit applies to the total amount of uncompressed
243 data stored in a Stream.
245 If an implementation supports handling .xz files with multiple
246 concatenated Streams, it may apply the above limits to the file
247 as a whole instead of limiting per Stream basis.
252 +---+---+---+---+---+---+-------+------+--+--+--+--+
253 | Header Magic Bytes | Stream Flags | CRC32 |
254 +---+---+---+---+---+---+-------+------+--+--+--+--+
257 2.1.1.1. Header Magic Bytes
259 The first six (6) bytes of the Stream are so called Header
260 Magic Bytes. They can be used to identify the file type.
262 Using a C array and ASCII:
263 const uint8_t HEADER_MAGIC[6]
264 = { 0xFD, '7', 'z', 'X', 'Z', 0x00 };
266 In plain hexadecimal:
270 - The first byte (0xFD) was chosen so that the files cannot
271 be erroneously detected as being in .lzma format, in which
272 the first byte is in the range [0x00, 0xE0].
273 - The sixth byte (0x00) was chosen to prevent applications
274 from misdetecting the file as a text file.
276 If the Header Magic Bytes don't match, the decoder must
280 2.1.1.2. Stream Flags
282 The first byte of Stream Flags is always a nul byte. In future
283 this byte may be used to indicate new Stream version or other
286 The second byte of Stream Flags is a bit field:
288 Bit(s) Mask Description
289 0-3 0x0F Type of Check (see Section 3.3):
293 0x02 4 bytes (Reserved)
294 0x03 4 bytes (Reserved)
296 0x05 8 bytes (Reserved)
297 0x06 8 bytes (Reserved)
298 0x07 16 bytes (Reserved)
299 0x08 16 bytes (Reserved)
300 0x09 16 bytes (Reserved)
301 0x0A 32 bytes SHA-256
302 0x0B 32 bytes (Reserved)
303 0x0C 32 bytes (Reserved)
304 0x0D 64 bytes (Reserved)
305 0x0E 64 bytes (Reserved)
306 0x0F 64 bytes (Reserved)
307 4-7 0xF0 Reserved for future use; must be zero for now.
309 Implementations must support at least the Check IDs 0x00 (None)
310 and 0x01 (CRC32). Supporting other Check IDs is optional. If
311 an unsupported Check is used, the decoder should indicate a
314 If any reserved bit is set, the decoder must indicate an error.
315 It is possible that there is a new field present which the
316 decoder is not aware of, and can thus parse the Stream Header
322 The CRC32 is calculated from the Stream Flags field. It is
323 stored as an unsigned 32-bit little endian integer. If the
324 calculated value does not match the stored one, the decoder
325 must indicate an error.
327 The idea is that Stream Flags would always be two bytes, even
328 if new features are needed. This way old decoders will be able
329 to verify the CRC32 calculated from Stream Flags, and thus
330 distinguish between corrupt files (CRC32 doesn't match) and
331 files that the decoder doesn't support (CRC32 matches but
332 Stream Flags has reserved bits set).
337 +-+-+-+-+---+---+---+---+-------+------+----------+---------+
338 | CRC32 | Backward Size | Stream Flags | Footer Magic Bytes |
339 +-+-+-+-+---+---+---+---+-------+------+----------+---------+
344 The CRC32 is calculated from the Backward Size and Stream Flags
345 fields. It is stored as an unsigned 32-bit little endian
346 integer. If the calculated value does not match the stored one,
347 the decoder must indicate an error.
349 The reason to have the CRC32 field before the Backward Size and
350 Stream Flags fields is to keep the four-byte fields aligned to
351 a multiple of four bytes.
354 2.1.2.2. Backward Size
356 Backward Size is stored as a 32-bit little endian integer,
357 which indicates the size of the Index field as multiple of
358 four bytes, minimum value being four bytes:
360 real_backward_size = (stored_backward_size + 1) * 4;
362 Using a fixed-size integer to store this value makes it
363 slightly simpler to parse the Stream Footer when the
364 application needs to parse the Stream backwards.
367 2.1.2.3. Stream Flags
369 This is a copy of the Stream Flags field from the Stream
370 Header. The information stored to Stream Flags is needed
371 when parsing the Stream backwards. The decoder must compare
372 the Stream Flags fields in both Stream Header and Stream
373 Footer, and indicate an error if they are not identical.
376 2.1.2.4. Footer Magic Bytes
378 As the last step of the decoding process, the decoder must
379 verify the existence of Footer Magic Bytes. If they don't
380 match, an error must be indicated.
382 Using a C array and ASCII:
383 const uint8_t FOOTER_MAGIC[2] = { 'Y', 'Z' };
388 The primary reason to have Footer Magic Bytes is to make
389 it easier to detect incomplete files quickly, without
390 uncompressing. If the file does not end with Footer Magic Bytes
391 (excluding Stream Padding described in Section 2.2), it cannot
392 be undamaged, unless someone has intentionally appended garbage
393 after the end of the Stream.
398 Only the decoders that support decoding of concatenated Streams
399 must support Stream Padding.
401 Stream Padding must contain only nul bytes. Any non-nul byte
402 should be considered as the beginning of a new Stream. To
403 preserve the four-byte alignment of consecutive Streams, the
404 size of Stream Padding must be a multiple of four bytes. Empty
405 Stream Padding is allowed.
407 Note that non-empty Stream Padding is allowed at the end of the
408 file; there doesn't need to be a new Stream after non-empty
409 Stream Padding. This can be convenient in certain situations
412 The possibility of Padding should be taken into account when
413 designing an application that parses the Stream backwards.
418 +==============+=================+=======+
419 | Block Header | Compressed Data | Check |
420 +==============+=================+=======+
425 +-------------------+-------------+=================+
426 | Block Header Size | Block Flags | Compressed Size |
427 +-------------------+-------------+=================+
429 +===================+======================+
430 ---> | Uncompressed Size | List of Filter Flags |
431 +===================+======================+
433 +================+--+--+--+--+
434 ---> | Header Padding | CRC32 |
435 +================+--+--+--+--+
438 3.1.1. Block Header Size
440 This field overlaps with the Index Indicator field (see
443 This field contains the size of the Block Header field,
444 including the Block Header Size field itself. Valid values are
445 in the range [0x01, 0xFF], which indicate the size of the Block
446 Header as multiples of four bytes, minimum size being eight
449 real_header_size = (encoded_header_size + 1) * 4;
451 If bigger Block Header is needed in future, a new field can be
452 added between the current Block Header and Compressed Data
453 fields. The presence of this new field would be indicated in
459 The first byte of the Block Flags field is a bit field:
461 Bit(s) Mask Description
462 0-1 0x03 Number of filters (1-4)
463 2-5 0x3C Reserved for future use; must be zero for now.
464 6 0x40 The Compressed Size field is present.
465 7 0x80 The Uncompressed Size field is present.
467 If any reserved bit is set, the decoder must indicate an error.
468 It is possible that there is a new field present which the
469 decoder is not aware of, and can thus parse the Block Header
473 3.1.3. Compressed Size
475 This field is present only if the appropriate bit is set in
476 the Block Flags field (see Section 3.1.2).
478 This field contains the size of the Compressed Data field as
479 multiple of four bytes, minimum value being four bytes:
481 real_compressed_size = (stored_compressed_size + 1) * 4;
483 The size is stored using the encoding described in Section 1.2.
484 If the Compressed Size does not match the real size of the
485 Compressed Data field, the decoder must indicate an error.
488 3.1.4. Uncompressed Size
490 This field is present only if the appropriate bit is set in
491 the Block Flags field (see Section 3.1.2).
493 The Uncompressed Size field contains the size of the Block
494 after uncompressing. Uncompressed Size is stored using the
495 encoding described in Section 1.2. If the Uncompressed Size
496 does not match the real uncompressed size, the decoder must
499 Storing the Compressed Size and Uncompressed Size fields serves
501 - The decoder knows how much memory it needs to allocate
502 for a temporary buffer in multithreaded mode.
503 - Simple error detection: wrong size indicates a broken file.
504 - Seeking forwards to a specific location in streamed mode.
506 It should be noted that the only reliable way to determine
507 the real uncompressed size is to uncompress the Block,
508 because the Block Header and Index fields may contain
509 (intentionally or unintentionally) invalid information.
512 3.1.5. List of Filter Flags
514 +================+================+ +================+
515 | Filter 0 Flags | Filter 1 Flags | ... | Filter n Flags |
516 +================+================+ +================+
518 The number of Filter Flags fields is stored in the Block Flags
519 field (see Section 3.1.2).
521 The format of each Filter Flags field is as follows:
523 +===========+====================+===================+
524 | Filter ID | Size of Properties | Filter Properties |
525 +===========+====================+===================+
527 Both Filter ID and Size of Properties are stored using the
528 encoding described in Section 1.2. Size of Properties indicates
529 the size of the Filter Properties field as bytes. The list of
530 officially defined Filter IDs and the formats of their Filter
531 Properties are described in Section 5.3.
533 Filter IDs greater than or equal to 0x4000_0000_0000_0000
534 (2^62) are reserved for implementation-specific internal use.
535 These Filter IDs must never be used in List of Filter Flags.
538 3.1.6. Header Padding
540 This field contains as many nul byte as it is needed to make
541 the Block Header have the size specified in Block Header Size.
542 If any of the bytes are not nul bytes, the decoder must
543 indicate an error. It is possible that there is a new field
544 present which the decoder is not aware of, and can thus parse
545 the Block Header incorrectly.
550 The CRC32 is calculated over everything in the Block Header
551 field except the CRC32 field itself. It is stored as an
552 unsigned 32-bit little endian integer. If the calculated
553 value does not match the stored one, the decoder must indicate
556 By verifying the CRC32 of the Block Header before parsing the
557 actual contents allows the decoder to distinguish between
558 corrupt and unsupported files.
563 The format of Compressed Data depends on Block Flags and List
564 of Filter Flags. Excluding the descriptions of the simplest
565 filters in Section 5.3, the format of the filter-specific
566 encoded data is out of scope of this document.
568 If the natural size of Compressed Data is not a multiple of
569 four bytes, it must be padded with 1-3 nul bytes to make it
570 a multiple of four bytes.
575 The type and size of the Check field depends on which bits
576 are set in the Stream Flags field (see Section 2.1.1.2).
578 The Check, when used, is calculated from the original
579 uncompressed data. If the calculated Check does not match the
580 stored one, the decoder must indicate an error. If the selected
581 type of Check is not supported by the decoder, it must indicate
587 +-----------------+=========================+
588 | Index Indicator | Number of Index Records |
589 +-----------------+=========================+
591 +=================+=========+-+-+-+-+
592 ---> | List of Records | Padding | CRC32 |
593 +=================+=========+-+-+-+-+
595 Index serves several purporses. Using it, one can
596 - verify that all Blocks in a Stream have been processed;
597 - find out the uncompressed size of a Stream; and
598 - quickly access the beginning of any Block (random access).
603 This field overlaps with the Block Header Size field (see
604 Section 3.1.1). The value of Index Indicator is always 0x00.
607 4.2. Number of Records
609 This field indicates how many Records there are in the List
610 of Records field, and thus how many Blocks there are in the
611 Stream. The value is stored using the encoding described in
612 Section 1.2. If the decoder has decoded all the Blocks of the
613 Stream, and then notices that the Number of Records doesn't
614 match the real number of Blocks, the decoder must indicate an
620 List of Records consists of as many Records as indicated by the
621 Number of Records field:
624 | Record | Record | ...
627 Each Record contains two fields:
629 +============+===================+
630 | Total Size | Uncompressed Size |
631 +============+===================+
633 If the decoder has decoded all the Blocks of the Stream, it
634 must verify that the contents of the Records match the real
635 Total Size and Uncompressed Size of the respective Blocks.
637 Implementation hint: It is possible to verify the Index with
638 constant memory usage by calculating for example SHA256 of both
639 the real size values and the List of Records, then comparing
640 the check values. Implementing this using non-cryptographic
641 check like CRC32 should be avoided unless small code size is
644 If the decoder supports random-access reading, it must verify
645 that Total Size and Uncompressed Size of every completely
646 decoded Block match the sizes stored in the Index. If only
647 partial Block is decoded, the decoder must verify that the
648 processed sizes don't exceed the sizes stored in the Index.
653 This field indicates the encoded size of the respective Block
654 as multiples of four bytes, minimum value being four bytes:
656 real_total_size = (stored_total_size + 1) * 4;
658 The value is stored using the encoding described in Section
662 4.3.2. Uncompressed Size
664 This field indicates the Uncompressed Size of the respective
665 Block as bytes. The value is stored using the encoding
666 described in Section 1.2.
671 This field must contain 0-3 nul bytes to pad the Index to
672 a multiple of four bytes.
677 The CRC32 is calculated over everything in the Index field
678 except the CRC32 field itself. The CRC32 is stored as an
679 unsigned 32-bit little endian integer. If the calculated
680 value does not match the stored one, the decoder must indicate
686 The Block Flags field defines how many filters are used. When
687 more than one filter is used, the filters are chained; that is,
688 the output of one filter is the input of another filter. The
689 following figure illustrates the direction of data flow.
691 v Uncompressed Data ^
693 Encoder | Filter 1 | Decoder
700 Alignment of uncompressed input data is usually the job of
701 the application producing the data. For example, to get the
702 best results, an archiver tool should make sure that all
703 PowerPC executable files in the archive stream start at
704 offsets that are multiples of four bytes.
706 Some filters, for example LZMA2, can be configured to take
707 advantage of specified alignment of input data. Note that
708 taking advantage of aligned input can be benefical also when
709 a filter is not the first filter in the chain. For example,
710 if you compress PowerPC executables, you may want to use the
711 PowerPC filter and chain that with the LZMA2 filter. Because
712 not only the input but also the output alignment of the PowerPC
713 filter is four bytes, it is now benefical to set LZMA2 settings
714 so that the LZMA2 encoder can take advantage of its
715 four-byte-aligned input data.
717 The output of the last filter in the chain is stored to the
718 Compressed Data field, which is is guaranteed to be aligned
719 to a multiple of four bytes relative to the beginning of the
720 Stream. This can increase
721 - speed, if the filtered data is handled multiple bytes at
722 a time by the filter-specific encoder and decoder,
723 because accessing aligned data in computer memory is
725 - compression ratio, if the output data is later compressed
726 with an external compression tool.
731 If filters would be allowed to be chained freely, it would be
732 possible to create malicious files, that would be very slow to
733 decode. Such files could be used to create denial of service
736 Slow files could occur when multiple filters are chained:
738 v Compressed input data
739 | Filter 1 decoder (last filter)
740 | Filter 0 decoder (non-last filter)
741 v Uncompressed output data
743 The decoder of the last filter in the chain produces a lot of
744 output from little input. Another filter in the chain takes the
745 output of the last filter, and produces very little output
746 while consuming a lot of input. As a result, a lot of data is
747 moved inside the filter chain, but the filter chain as a whole
748 gets very little work done.
750 To prevent this kind of slow files, there are restrictions on
751 how the filters can be chained. These restrictions must be
752 taken into account when designing new filters.
754 The maximum number of filters in the chain has been limited to
755 four, thus there can be at maximum of three non-last filters.
756 Of these three non-last filters, only two are allowed to change
757 the size of the data.
759 The non-last filters, that change the size of the data, must
760 have a limit how much the decoder can compress the data: the
761 decoder should produce at least n bytes of output when the
762 filter is given 2n bytes of input. This limit is not
763 absolute, but significant deviations must be avoided.
765 The above limitations guarantee that if the last filter in the
766 chain produces 4n bytes of output, the chain as a whole will
767 produce at least n bytes of output.
774 LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
775 compression algorithm with high compression ratio and fast
776 decompression. LZMA is based on LZ77 and range coding
779 LZMA2 is an extensions on top of the original LZMA. LZMA2 uses
780 LZMA internally, but adds support for flushing the encoder,
781 uncompressed chunks, eases stateful decoder implementations,
782 and improves support for multithreading. Thus, the plain LZMA
783 will not be supported in this file format.
786 Size of Filter Properties: 1 byte
787 Changes size of data: Yes
788 Allow as a non-last filter: No
789 Allow as the last filter: Yes
792 Input data: Adjustable to 1/2/4/8/16 byte(s)
795 The format of the one-byte Filter Properties field is as
798 Bits Mask Description
799 0-5 0x3F Dictionary Size
800 6-7 0xC0 Reserved for future use; must be zero for now.
802 Dictionary Size is encoded with one-bit mantissa and five-bit
803 exponent. The smallest dictionary size is 4 KiB and the biggest
806 Raw value Mantissa Exponent Dictionary size
820 40 2 31 4096 MiB - 1 B
822 Instead of having a table in the decoder, the dictionary size
823 can be decoded using the following C code:
825 const uint8_t bits = get_dictionary_flags() & 0x3F;
827 return DICTIONARY_TOO_BIG; // Bigger than 4 GiB
829 uint32_t dictionary_size;
831 dictionary_size = UINT32_MAX;
833 dictionary_size = 2 | (bits & 1);
834 dictionary_size <<= bits / 2 + 11;
838 5.3.2. Branch/Call/Jump Filters for Executables
840 These filters convert relative branch, call, and jump
841 instructions to their absolute counterparts in executable
842 files. This conversion increases redundancy and thus
845 Size of Filter Properties: 0 or 4 bytes
846 Changes size of data: No
847 Allow as a non-last filter: Yes
848 Allow as the last filter: No
850 Detecting when all of the data has been decoded:
851 Uncompressed size: Yes
852 End of Payload Marker: No
855 Below is the list of filters in this category. The alignment
856 is the same for both input and output data.
858 Filter ID Alignment Description
859 0x04 1 byte x86 filter (BCJ)
860 0x05 4 bytes PowerPC (big endian) filter
861 0x06 16 bytes IA64 filter
862 0x07 4 bytes ARM (little endian) filter
863 0x08 2 bytes ARM Thumb (little endian) filter
864 0x09 4 bytes SPARC filter
866 If the size of Filter Properties is four bytes, the Filter
867 Properties field contains the start offset used for address
868 conversions. It is stored as an unsigned 32-bit little endian
869 integer. If the size of Filter Properties is zero, the start
872 Setting the start offset may be useful if an executable has
873 multiple sections, and there are many cross-section calls.
874 Taking advantage of this feature usually requires usage of
880 The Delta filter may increase compression ratio when the value
881 of the next byte correlates with the value of an earlier byte
882 at specified distance.
885 Size of Filter Properties: 1 byte
886 Changes size of data: No
887 Allow as a non-last filter: Yes
888 Allow as the last filter: No
892 Output data: Same as the original input data
894 The Properties byte indicates the delta distance, which can be
895 1-256 bytes backwards from the current byte: 0x00 indicates
896 distance of 1 byte and 0xFF distance of 256 bytes.
899 5.3.3.1. Format of the Encoded Output
901 The code below illustrates both encoding and decoding with
904 // Distance is in the range [1, 256].
905 const unsigned int distance = get_properties_byte() + 1;
909 memset(delta, 0, sizeof(delta));
912 const int byte = read_byte();
916 uint8_t tmp = delta[(uint8_t)(distance + pos)];
918 tmp = (uint8_t)(byte) - tmp;
919 delta[pos] = (uint8_t)(byte);
921 tmp = (uint8_t)(byte) + tmp;
930 5.4. Custom Filter IDs
932 If a developer wants to use custom Filter IDs, he has two
933 choices. The first choice is to contact Lasse Collin and ask
934 him to allocate a range of IDs for the developer.
936 The second choice is to generate a 40-bit random integer,
937 which the developer can use as his personal Developer ID.
938 To minimalize the risk of collisions, Developer ID has to be
939 a randomly generated integer, not manually selected "hex word".
940 The following command, which works on many free operating
941 systems, can be used to generate Developer ID:
943 dd if=/dev/urandom bs=5 count=1 | hexdump
945 The developer can then use his Developer ID to create unique
946 (well, hopefully unique) Filter IDs.
948 Bits Mask Description
949 0-15 0x0000_0000_0000_FFFF Filter ID
950 16-55 0x00FF_FFFF_FFFF_0000 Developer ID
951 56-62 0x3F00_0000_0000_0000 Static prefix: 0x3F
953 The resulting 63-bit integer will use 9 bytes of space when
954 stored using the encoding described in Section 1.2. To get
955 a shorter ID, see the beginning of this Section how to
956 request a custom ID range.
959 5.4.1. Reserved Custom Filter ID Ranges
962 0x0002_0000 - 0x0007_FFFF Reserved to ease .7z compatibility
963 0x0200_0000 - 0x07FF_FFFF Reserved to ease .7z compatibility
966 6. Cyclic Redundancy Checks
968 There are several incompatible variations to calculate CRC32
969 and CRC64. For simplicity and clarity, complete examples are
970 provided to calculate the checks as they are used in this file
971 format. Implementations may use different code as long as it
972 gives identical results.
974 The program below reads data from standard input, calculates
975 the CRC32 and CRC64 values, and prints the calculated values
976 as big endian hexadecimal strings to standard output.
978 #include <sys/types.h>
979 #include <inttypes.h>
982 uint32_t crc32_table[256];
983 uint64_t crc64_table[256];
988 static const uint32_t poly32 = UINT32_C(0xEDB88320);
989 static const uint64_t poly64
990 = UINT64_C(0xC96C5795D7870F42);
992 for (size_t i = 0; i < 256; ++i) {
996 for (size_t j = 0; j < 8; ++j) {
998 crc32 = (crc32 >> 1) ^ poly32;
1003 crc64 = (crc64 >> 1) ^ poly64;
1008 crc32_table[i] = crc32;
1009 crc64_table[i] = crc64;
1014 crc32(const uint8_t *buf, size_t size, uint32_t crc)
1017 for (size_t i = 0; i < size; ++i)
1018 crc = crc32_table[buf[i] ^ (crc & 0xFF)]
1024 crc64(const uint8_t *buf, size_t size, uint64_t crc)
1027 for (size_t i = 0; i < size; ++i)
1028 crc = crc64_table[buf[i] ^ (crc & 0xFF)]
1038 uint32_t value32 = 0;
1039 uint64_t value64 = 0;
1040 uint64_t total_size = 0;
1044 const size_t buf_size = fread(buf, 1, 8192, stdin);
1048 total_size += buf_size;
1049 value32 = crc32(buf, buf_size, value32);
1050 value64 = crc64(buf, buf_size, value64);
1053 printf("Bytes: %" PRIu64 "\n", total_size);
1054 printf("CRC-32: 0x%08" PRIX32 "\n", value32);
1055 printf("CRC-64: 0x%016" PRIX64 "\n", value64);
1063 LZMA SDK - The original LZMA implementation
1064 http://7-zip.org/sdk.html
1066 LZMA Utils - LZMA adapted to POSIX-like systems
1067 http://tukaani.org/lzma/
1070 GZIP file format specification version 4.3
1071 http://www.ietf.org/rfc/rfc1952.txt
1072 - Notation of byte boxes in section `2.1. Overall conventions'
1075 Key words for use in RFCs to Indicate Requirement Levels
1076 http://www.ietf.org/rfc/rfc2119.txt
1079 GNU tar 1.16.1 manual
1080 http://www.gnu.org/software/tar/manual/html_node/Blocking-Factor.html
1081 - Node 9.4.2 `Blocking Factor', paragraph that begins
1082 `gzip will complain about trailing garbage'
1083 - Note that this URL points to the latest version of the
1084 manual, and may some day not contain the note which is in
1085 1.16.1. For the exact version of the manual, download GNU
1086 tar 1.16.1: ftp://ftp.gnu.org/pub/gnu/tar/tar-1.16.1.tar.gz