From cec0ddc8ec4ce81685a51998b978e22167e461f9 Mon Sep 17 00:00:00 2001 From: Lasse Collin Date: Mon, 27 Sep 2010 23:29:34 +0300 Subject: [PATCH] Major man page updates. Lots of content was updated on the xz man page. Technical improvements: - Start a new sentence on a new line. - Use fairly short lines. - Use constant-width font for examples (where supported). - Some minor cleanups. Thanks to Jonathan Nieder for some language fixes. --- src/lzmainfo/lzmainfo.1 | 25 +- src/scripts/xzdiff.1 | 15 +- src/scripts/xzgrep.1 | 11 +- src/scripts/xzless.1 | 13 +- src/scripts/xzmore.1 | 9 +- src/xz/xz.1 | 1964 +++++++++++++++++++++++++++------------ src/xzdec/xzdec.1 | 39 +- 7 files changed, 1435 insertions(+), 641 deletions(-) diff --git a/src/lzmainfo/lzmainfo.1 b/src/lzmainfo/lzmainfo.1 index 235a6b5..f2b93b4 100644 --- a/src/lzmainfo/lzmainfo.1 +++ b/src/lzmainfo/lzmainfo.1 @@ -4,7 +4,7 @@ .\" This file has been put into the public domain. .\" You can do whatever you want with this file. .\" -.TH LZMAINFO 1 "2010-07-28" "Tukaani" "XZ Utils" +.TH LZMAINFO 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME lzmainfo \- show information stored in the .lzma file header .SH SYNOPSIS @@ -16,10 +16,12 @@ lzmainfo \- show information stored in the .lzma file header .B lzmainfo shows information stored in the .B .lzma -file header. It reads the first 13 bytes from the specified +file header. +It reads the first 13 bytes from the specified .IR file , decodes the header, and prints it to standard output in human -readable format. If no +readable format. +If no .I files are given or .I file @@ -27,16 +29,19 @@ is .BR \- , standard input is read. .PP -Usually the most interesting information is the uncompressed size and -the dictionary size. Uncompressed size can be shown only if the file is -in the non-streamed +Usually the most interesting information is +the uncompressed size and the dictionary size. +Uncompressed size can be shown only if +the file is in the non-streamed .B .lzma -format variant. The amount of memory required to decompress the file is +format variant. +The amount of memory required to decompress the file is a few dozen kilobytes plus the dictionary size. .PP .B lzmainfo -is included in XZ Utils primarily for backward compatibility with LZMA Utils. -.SH EXIT STATUS +is included in XZ Utils primarily for +backward compatibility with LZMA Utils. +.SH "EXIT STATUS" .TP .B 0 All is good. @@ -51,5 +56,5 @@ while the correct suffix would be .B MiB (2^20 bytes). This is to keep the output compatible with LZMA Utils. -.SH SEE ALSO +.SH "SEE ALSO" .BR xz (1) diff --git a/src/scripts/xzdiff.1 b/src/scripts/xzdiff.1 index 318d06f..d97f3cb 100644 --- a/src/scripts/xzdiff.1 +++ b/src/scripts/xzdiff.1 @@ -6,7 +6,7 @@ .\" .\" License: GNU GPLv2+ .\" -.TH XZDIFF 1 "2009-07-05" "Tukaani" "XZ Utils" +.TH XZDIFF 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME xzcmp, xzdiff, lzcmp, lzdiff \- compare compressed files .SH SYNOPSIS @@ -22,7 +22,7 @@ xzcmp, xzdiff, lzcmp, lzdiff \- compare compressed files .B lzdiff .RI [ diff_options "] " file1 " [" file2 ] .SH DESCRIPTION -.B xzcmp +.B xzcmp and .B xzdiff invoke @@ -36,22 +36,23 @@ on files compressed with or .BR bzip2 (1). All options specified are passed directly to -.B cmp +.BR cmp (1) or -.BR diff . +.BR diff (1). If only one file is specified, then the files compared are .I file1 (which must have a suffix of a supported compression format) and .I file1 from which the compression format suffix has been stripped. -If two files are specified, then they are uncompressed if necessary and fed to +If two files are specified, +then they are uncompressed if necessary and fed to .BR cmp (1) or .BR diff (1). The exit status from -.B cmp +.BR cmp (1) or -.B diff +.BR diff (1) is preserved. .PP The names diff --git a/src/scripts/xzgrep.1 b/src/scripts/xzgrep.1 index 996d64a..a96f1b8 100644 --- a/src/scripts/xzgrep.1 +++ b/src/scripts/xzgrep.1 @@ -6,15 +6,15 @@ .\" .\" License: GNU GPLv2+ .\" -.TH XZGREP 1 "2009-07-05" "Tukaani" "XZ Utils" +.TH XZGREP 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME xzgrep \- search compressed files for a regular expression .SH SYNOPSIS .B xzgrep .RI [ grep_options ] -.RB [ \-e ] +.RB [ \-e ] .I pattern -.IR file ".\|.\|." +.IR file "..." .br .B xzegrep .RB ... @@ -31,7 +31,7 @@ xzgrep \- search compressed files for a regular expression .B lzfgrep .RB ... .SH DESCRIPTION -.B xzgrep +.B xzgrep invokes .BR grep (1) on @@ -47,7 +47,8 @@ All options specified are passed directly to .PP If no .I file -is specified, then the standard input is decompressed if necessary and fed to +is specified, then standard input is decompressed if necessary +and fed to .BR grep (1). When reading from standard input, .BR gzip (1) diff --git a/src/scripts/xzless.1 b/src/scripts/xzless.1 index 299806f..2d05459 100644 --- a/src/scripts/xzless.1 +++ b/src/scripts/xzless.1 @@ -7,7 +7,7 @@ .\" .\" (Note that this file is not based on gzip's zless.1.) .\" -.TH XZLESS 1 "2009-07-05" "Tukaani" "XZ Utils" +.TH XZLESS 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME xzless, lzless \- view xz or lzma compressed (text) files .SH SYNOPSIS @@ -17,7 +17,7 @@ xzless, lzless \- view xz or lzma compressed (text) files .B lzless .RI [ file ...] .SH DESCRIPTION -.B xzless +.B xzless is a filter that displays text from compressed files to a terminal. It works on files compressed with .BR xz (1) @@ -32,9 +32,11 @@ reads from standard input. .B xzless uses .BR less (1) -to present its output. Unlike +to present its output. +Unlike .BR xzmore , -its choice of pager cannot be altered by setting an environment variable. +its choice of pager cannot be altered by +setting an environment variable. Commands are based on both .BR more (1) and @@ -50,7 +52,8 @@ is provided for backward compatibility with LZMA Utils. .SH ENVIRONMENT .TP .B LESSMETACHARS -A list of characters special to the shell. Set by +A list of characters special to the shell. +Set by .B xzless unless it is already set in the environment. .TP diff --git a/src/scripts/xzmore.1 b/src/scripts/xzmore.1 index 42542bf..30dad68 100644 --- a/src/scripts/xzmore.1 +++ b/src/scripts/xzmore.1 @@ -4,22 +4,23 @@ .\" .\" License: GNU GPLv2+ .\" -.TH XZMORE 1 "2009-07-05" "Tukaani" "XZ Utils" +.TH XZMORE 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME xzmore, lzmore \- view xz or lzma compressed (text) files .SH SYNOPSIS .B xzmore -.RI [ "filename ..." ] +.RI [ "filename ..." ] .br .B lzmore -.RI [ "filename ..." ] +.RI [ "filename ..." ] .SH DESCRIPTION .B xzmore is a filter which allows examination of .BR xz (1) or .BR lzma (1) -compressed text files one screenful at a time on a soft-copy terminal. +compressed text files one screenful at a time +on a soft-copy terminal. .PP To use a pager other than the default .B more, diff --git a/src/xz/xz.1 b/src/xz/xz.1 index a2eabd7..44dc1a4 100644 --- a/src/xz/xz.1 +++ b/src/xz/xz.1 @@ -5,9 +5,11 @@ .\" This file has been put into the public domain. .\" You can do whatever you want with this file. .\" -.TH XZ 1 "2010-08-07" "Tukaani" "XZ Utils" +.TH XZ 1 "2010-09-27" "Tukaani" "XZ Utils" +. .SH NAME xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files +. .SH SYNOPSIS .B xz .RI [ option ]... @@ -33,8 +35,8 @@ is equivalent to is equivalent to .BR "xz \-\-format=lzma \-\-decompress \-\-stdout" . .PP -When writing scripts that need to decompress files, it is recommended to -always use the name +When writing scripts that need to decompress files, +it is recommended to always use the name .B xz with appropriate arguments .RB ( "xz \-d" @@ -43,19 +45,22 @@ or instead of the names .B unxz and -.BR xzcat. +.BR xzcat . +. .SH DESCRIPTION .B xz -is a general-purpose data compression tool with command line syntax similar to +is a general-purpose data compression tool with +command line syntax similar to .BR gzip (1) and .BR bzip2 (1). The native file format is the .B .xz -format, but also the legacy +format, but the legacy .B .lzma -format and raw compressed streams with no container format headers -are supported. +format used by LZMA Utils and +raw compressed streams with no container format headers +are also supported. .PP .B xz compresses or decompresses each @@ -68,13 +73,16 @@ are given or is .BR \- , .B xz -reads from standard input and writes the processed data to standard output. +reads from standard input and writes the processed data +to standard output. .B xz will refuse (display an error and skip the .IR file ) -to write compressed data to standard output if it is a terminal. Similarly, +to write compressed data to standard output if it is a terminal. +Similarly, .B xz -will refuse to read compressed data from standard input if it is a terminal. +will refuse to read compressed data +from standard input if it is a terminal. .PP Unless .B \-\-stdout @@ -117,8 +125,9 @@ will display a warning and skip the if any of the following applies: .IP \(bu 3 .I File -is not a regular file. Symbolic links are not followed, thus they -are not considered to be regular files. +is not a regular file. +Symbolic links are not followed, +thus they are not considered to be regular files. .IP \(bu 3 .I File has more than one hard link. @@ -154,12 +163,13 @@ or After successfully compressing or decompressing the .IR file , .B xz -copies the owner, group, permissions, access time, and modification time -from the source +copies the owner, group, permissions, access time, +and modification time from the source .I file -to the target file. If copying the group fails, the permissions are modified -so that the target file doesn't become accessible to users who didn't have -permission to access the source +to the target file. +If copying the group fails, the permissions are modified +so that the target file doesn't become accessible to users +who didn't have permission to access the source .IR file . .B xz doesn't support copying other metadata like access control lists @@ -169,7 +179,8 @@ Once the target file has been successfully closed, the source .I file is removed unless .B \-\-keep -was specified. The source +was specified. +The source .I file is never removed if the output is written to standard output. .PP @@ -180,42 +191,51 @@ or to the .B xz process makes it print progress information to standard error. -This has only limited use since when standard error is a terminal, using +This has only limited use since when standard error +is a terminal, using .B \-\-verbose will display an automatically updating progress indicator. +. .SS "Memory usage" The memory usage of .B xz -varies from a few hundred kilobytes to several gigabytes depending on -the compression settings. The settings used when compressing a file -determine the memory requirements of the decompressor. Typically the -decompressor needs only 5\ % to 20\ % of the amount of memory that the -compressor needed when creating the file. For example, decompressing a -file created with +varies from a few hundred kilobytes to several gigabytes +depending on the compression settings. +The settings used when compressing a file determine +the memory requirements of the decompressor. +Typically the decompressor needs 5\ % to 20\ % of +the amount of memory that the compressor needed when +creating the file. +For example, decompressing a file created with .B xz \-9 -currently requires 65 MiB of memory. Still, it is possible to have +currently requires 65\ MiB of memory. +Still, it is possible to have .B .xz -files that need several gigabytes of memory to decompress. +files that require several gigabytes of memory to decompress. .PP -Especially users of older systems may find the possibility of very large -memory usage annoying. To prevent uncomfortable surprises, +Especially users of older systems may find +the possibility of very large memory usage annoying. +To prevent uncomfortable surprises, .B xz has a built-in memory usage limiter, which is disabled by default. -While some operating systems provide ways to limit the memory usage of -processes, relying on it wasn't deemed to be flexible enough (e.g. using +While some operating systems provide ways to limit +the memory usage of processes, relying on it +wasn't deemed to be flexible enough (e.g. using .BR ulimit (1) to limit virtual memory tends to cripple .BR mmap (2)). .PP -The memory usage limiter can be enabled with the command line option -\fB\-\-memlimit=\fIlimit\fR, but often it is more convenient to enable -the limiter by default by setting the environment variable +The memory usage limiter can be enabled with +the command line option \fB\-\-memlimit=\fIlimit\fR. +Often it is more convenient to enable the limiter +by default by setting the environment variable .BR XZ_DEFAULTS , -e.g. +e.g.\& .BR XZ_DEFAULTS=\-\-memlimit=150MiB . -It is possible to set the limits separately for compression and decompression +It is possible to set the limits separately +for compression and decompression by using \fB\-\-memlimit\-compress=\fIlimit\fR and -\fB\-\-memlimit\-decompress=\fIlimit\fR, respectively. +\fB\-\-memlimit\-decompress=\fIlimit\fR. Using these two options outside .B XZ_DEFAULTS is rarely useful, because a single run of @@ -230,15 +250,19 @@ If the specified memory usage limit is exceeded when decompressing, will display an error and decompressing the file will fail. If the limit is exceeded when compressing, .B xz -will try to scale the settings down so that the limit is no longer exceeded -(except when using \fB\-\-format=raw\fR or \fB\-\-no\-adjust\fR). -This way the operation won't fail unless the limit is very small. The scaling -of the settings is done in steps that don't match the compression level -presets, e.g. if the limit is only slightly less than the amount required for +will try to scale the settings down so that the limit +is no longer exceeded (except when using \fB\-\-format=raw\fR +or \fB\-\-no\-adjust\fR). +This way the operation won't fail unless the limit is very small. +The scaling of the settings is done in steps that don't +match the compression level presets, e.g. if the limit is +only slightly less than the amount required for .BR "xz \-9" , -the settings will be scaled down only a little, not all the way down to +the settings will be scaled down only a little, +not all the way down to .BR "xz \-8" . -.SS Concatenation and padding with .xz files +. +.SS "Concatenation and padding with .xz files" It is possible to concatenate .B .xz files as is. @@ -248,22 +272,27 @@ will decompress such files as if they were a single file. .PP It is possible to insert padding between the concatenated parts -or after the last part. The padding must be null bytes and the size -of the padding must be a multiple of four bytes. This can be useful -if the .xz file is stored on a medium that stores file sizes -e.g. as 512-byte blocks. +or after the last part. +The padding must consist of null bytes and the size +of the padding must be a multiple of four bytes. +This can be useful e.g. if the +.B .xz +file is stored on a medium that measures file sizes +in 512-byte blocks. .PP Concatenation and padding are not allowed with .B .lzma files or raw streams. +. .SH OPTIONS +. .SS "Integer suffixes and special values" -In most places where an integer argument is expected, an optional suffix -is supported to easily indicate large integers. There must be no space -between the integer and the suffix. +In most places where an integer argument is expected, +an optional suffix is supported to easily indicate large integers. +There must be no space between the integer and the suffix. .TP .B KiB -The integer is multiplied by 1,024 (2^10). Also +Multiply the integer by 1,024 (2^10). .BR Ki , .BR k , .BR kB , @@ -274,7 +303,7 @@ are accepted as synonyms for .BR KiB . .TP .B MiB -The integer is multiplied by 1,048,576 (2^20). Also +Multiply the integer by 1,048,576 (2^20). .BR Mi , .BR m , .BR M , @@ -284,7 +313,7 @@ are accepted as synonyms for .BR MiB . .TP .B GiB -The integer is multiplied by 1,073,741,824 (2^30). Also +Multiply the integer by 1,073,741,824 (2^30). .BR Gi , .BR g , .BR G , @@ -293,16 +322,20 @@ and are accepted as synonyms for .BR GiB . .PP -A special value +The special value .B max -can be used to indicate the maximum integer value supported by the option. +can be used to indicate the maximum integer value +supported by the option. +. .SS "Operation mode" -If multiple operation mode options are given, the last one takes effect. +If multiple operation mode options are given, +the last one takes effect. .TP .BR \-z ", " \-\-compress -Compress. This is the default operation mode when no operation mode option -is specified, and no other operation mode is implied from the command name -(for example, +Compress. +This is the default operation mode when no operation mode option +is specified, and no other operation mode is implied from +the command name (for example, .B unxz implies .BR \-\-decompress ). @@ -313,62 +346,73 @@ Decompress. .BR \-t ", " \-\-test Test the integrity of compressed .IR files . -No files are created or removed. This option is equivalent to +This option is equivalent to .B "\-\-decompress \-\-stdout" except that the decompressed data is discarded instead of being written to standard output. +No files are created or removed. .TP .BR \-l ", " \-\-list -List information about compressed +Print information about compressed .IR files . -No uncompressed output is produced, and no files are created or removed. -In list mode, the program cannot read the compressed data from standard +No uncompressed output is produced, +and no files are created or removed. +In list mode, the program cannot read +the compressed data from standard input or from other unseekable sources. -.IP +.IP "" The default listing shows basic information about .IR files , -one file per line. To get more detailed information, use also the +one file per line. +To get more detailed information, use also the .B \-\-verbose -option. For even more information, use +option. +For even more information, use .B \-\-verbose twice, but note that it may be slow, because getting all the extra -information requires many seeks. The width of verbose output exceeds -80 characters, so piping the output to e.g. +information requires many seeks. +The width of verbose output exceeds +80 characters, so piping the output to e.g.\& .B "less\ \-S" may be convenient if the terminal isn't wide enough. -.IP +.IP "" The exact output may vary between .B xz -versions and different locales. To get machine-readable output, +versions and different locales. +For machine-readable output, .B \-\-robot \-\-list should be used. +. .SS "Operation modifiers" .TP .BR \-k ", " \-\-keep -Keep (don't delete) the input files. +Don't delete the input files. .TP .BR \-f ", " \-\-force This option has several effects: .RS .IP \(bu 3 -If the target file already exists, delete it before compressing or -decompressing. +If the target file already exists, +delete it before compressing or decompressing. .IP \(bu 3 -Compress or decompress even if the input is a symbolic link to a regular file, -has more than one hard link, or has setuid, setgid, or sticky bit set. -The setuid, setgid, and sticky bits are not copied to the target file. +Compress or decompress even if the input is +a symbolic link to a regular file, +has more than one hard link, +or has the setuid, setgid, or sticky bit set. +The setuid, setgid, and sticky bits are not copied +to the target file. .IP \(bu 3 -If combined with +When used with .B \-\-decompress .BR \-\-stdout and .B xz -doesn't recognize the type of the source file, -.B xz -will copy the source file as is to standard output. This allows using +cannot recognize the type of the source file, +copy the source file as is to standard output. +This allows .B xzcat .B \-\-force -like +to be used like .BR cat (1) for files that have not been compressed with .BR xz . @@ -385,20 +429,22 @@ to decompress only a single file format. .RE .TP .BR \-c ", " \-\-stdout ", " \-\-to\-stdout -Write the compressed or decompressed data to standard output instead of -a file. This implies +Write the compressed or decompressed data to +standard output instead of a file. +This implies .BR \-\-keep . .TP .B \-\-no\-sparse -Disable creation of sparse files. By default, if decompressing into -a regular file, +Disable creation of sparse files. +By default, if decompressing into a regular file, .B xz -tries to make the file sparse if the decompressed data contains long -sequences of binary zeros. It works also when writing to standard output -as long as standard output is connected to a regular file, and certain -additional conditions are met to make it safe. Creating sparse files may -save disk space and speed up the decompression by reducing the amount of -disk I/O. +tries to make the file sparse if the decompressed data contains +long sequences of binary zeros. +It works also when writing to standard output +as long as standard output is connected to a regular file, +and certain additional conditions are met to make it safe. +Creating sparse files may save disk space and speed up +the decompression by reducing the amount of disk I/O. .TP \fB\-S\fR \fI.suf\fR, \fB\-\-suffix=\fI.suf When compressing, use @@ -407,11 +453,12 @@ as the suffix for the target file instead of .B .xz or .BR .lzma . -If not writing to standard output and the source file already has the suffix +If not writing to standard output and +the source file already has the suffix .IR .suf , a warning is displayed and the file is skipped. -.IP -When decompressing, recognize also files with the suffix +.IP "" +When decompressing, recognize files with the suffix .I .suf in addition to files with the .BR .xz , @@ -419,13 +466,15 @@ in addition to files with the .BR .lzma , or .B .tlz -suffix. If the source file has the suffix +suffix. +If the source file has the suffix .IR .suf , the suffix is removed to get the target filename. -.IP +.IP "" When compressing or decompressing raw streams .RB ( \-\-format=raw ), -the suffix must always be specified unless writing to standard output, +the suffix must always be specified unless +writing to standard output, because there is no default suffix for raw streams. .TP \fB\-\-files\fR[\fB=\fIfile\fR] @@ -433,8 +482,9 @@ Read the filenames to process from .IR file ; if .I file -is omitted, filenames are read from standard input. Filenames must be -terminated with the newline character. A dash +is omitted, filenames are read from standard input. +Filenames must be terminated with the newline character. +A dash .RB ( \- ) is taken as a regular filename; it doesn't mean standard input. If filenames are given also as command line arguments, they are @@ -442,53 +492,59 @@ processed before the filenames read from .IR file . .TP \fB\-\-files0\fR[\fB=\fIfile\fR] -This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except that the -filenames must be terminated with the null character. +This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except +that each filename must be terminated with the null character. +. .SS "Basic file format and compression options" .TP \fB\-F\fR \fIformat\fR, \fB\-\-format=\fIformat -Specify the file format to compress or decompress: +Specify the file +.I format +to compress or decompress: .RS -.IP \(bu 3 -.BR auto : -This is the default. When compressing, +.TP +.B auto +This is the default. +When compressing, .B auto is equivalent to .BR xz . -When decompressing, the format of the input file is automatically detected. +When decompressing, +the format of the input file is automatically detected. Note that raw streams (created with .BR \-\-format=raw ) cannot be auto-detected. -.IP \(bu 3 -.BR xz : +.TP +.B xz Compress to the .B .xz file format, or accept only .B .xz files when decompressing. -.IP \(bu 3 -.B lzma -or -.BR alone : +.TP +.BR lzma ", " alone Compress to the legacy .B .lzma file format, or accept only .B .lzma -files when decompressing. The alternative name +files when decompressing. +The alternative name .B alone is provided for backwards compatibility with LZMA Utils. -.IP \(bu 3 -.BR raw : -Compress or uncompress a raw stream (no headers). This is meant for advanced -users only. To decode raw streams, you need to set not only +.TP +.B raw +Compress or uncompress a raw stream (no headers). +This is meant for advanced users only. +To decode raw streams, you need use .B \-\-format=raw -but also specify the filter chain, which would normally be stored in the -container format headers. +and explicitly specify the filter chain, +which normally would have been stored in the container headers. .RE .TP \fB\-C\fR \fIcheck\fR, \fB\-\-check=\fIcheck -Specify the type of the integrity check, which is calculated from the -uncompressed data. This option has an effect only when compressing into the +Specify the type of the integrity check, which is calculated +from the uncompressed data. +This option has an effect only when compressing into the .B .xz format; the .B .lzma @@ -496,141 +552,248 @@ format doesn't support integrity checks. The integrity check (if any) is verified when the .B .xz file is decompressed. -.IP +.IP "" Supported .I check types: .RS -.IP \(bu 3 -.BR none : -Don't calculate an integrity check at all. This is usually a bad idea. This -can be useful when integrity of the data is verified by other means anyway. -.IP \(bu 3 -.BR crc32 : +.TP +.B none +Don't calculate an integrity check at all. +This is usually a bad idea. +This can be useful when integrity of the data is verified +by other means anyway. +.TP +.B crc32 Calculate CRC32 using the polynomial from IEEE-802.3 (Ethernet). -.IP \(bu 3 -.BR crc64 : -Calculate CRC64 using the polynomial from ECMA-182. This is the default, since -it is slightly better than CRC32 at detecting damaged files and the speed -difference is negligible. -.IP \(bu 3 -.BR sha256 : -Calculate SHA-256. This is somewhat slower than CRC32 and CRC64. +.TP +.B crc64 +Calculate CRC64 using the polynomial from ECMA-182. +This is the default, since it is slightly better than CRC32 +at detecting damaged files and the speed difference is negligible. +.TP +.B sha256 +Calculate SHA-256. +This is somewhat slower than CRC32 and CRC64. .RE -.IP +.IP "" Integrity of the .B .xz -headers is always verified with CRC32. It is not possible to change or -disable it. +headers is always verified with CRC32. +It is not possible to change or disable it. .TP .BR \-0 " ... " \-9 -Select compression preset. If a preset level is specified multiple times, +Select a compression preset level. +The default is +.BR \-6 . +If multiple preset levels are specified, the last one takes effect. -.IP -The compression preset levels can be categorised roughly into three -categories: -.RS -.IP "\fB\-0\fR ... \fB\-2" -Fast presets with relatively low memory usage. -.B \-1 +If a custom filter chain was already specified, setting +a compression preset level clears the custom filter chain. +.IP "" +The differences between the presets are more significant than with +.BR gzip (1) and -.B \-2 -should give compression speed and ratios comparable to -.B "bzip2 \-1" +.BR bzip2 (1). +The selected compression settings determine +the memory requirements of the decompressor, +thus using a too high preset level might make it painful +to decompress the file on an old system with little RAM. +Specifically, +.B "it's not a good idea to blindly use \-9 for everything" +like it often is with +.BR gzip (1) and -.BR "bzip2 \-9" , -respectively. -Currently -.B \-0 -is not very good (not much faster than -.B \-1 -but much worse compression). In future, +.BR bzip2 (1). +.RS +.TP +.BR "\-0" " ... " "\-3" +These are somewhat fast presets. .B \-0 -may be indicate some fast algorithm instead of LZMA2. -.IP "\fB\-3\fR ... \fB\-5" -Good compression ratio with low to medium memory usage. -These are significantly slower than levels 0\-2. -.IP "\fB\-6\fR ... \fB\-9" -Excellent compression with medium to high memory usage. These are also -slower than the lower preset levels. The default is -.BR \-6 . -Unless you want to maximize the compression ratio, you probably don't want -a higher preset level than -.B \-7 -due to speed and memory usage. +is sometimes faster than +.B "gzip \-9" +while compressing much better. +The higher ones often have speed comparable to +.BR bzip2 (1) +with comparable or better compression ratio, +although the results +depend a lot on the type of data being compressed. +.TP +.BR "\-4" " ... " "\-6" +Good to very good compression while keeping +decompressor memory usage reasonable even for old systems. +.B \-6 +is the default, which is usually a good choice +e.g. for distributing files that need to be decompressible +even on systems with only 16\ MiB RAM. +.RB ( \-5e +or +.B \-6e +may be worth considering too. +See +.BR \-\-extreme .) +.TP +.B "\-7 ... \-9" +These are like +.B \-6 +but with higher compressor and decompressor memory requirements. +These are useful only when compressing files bigger than +8\ MiB, 16\ MiB, and 32\ MiB, respectively. +.RE +.IP "" +On the same hardware, the decompression speed is approximately +a constant number of bytes of compressed data per second. +In other words, the better the compression, +the faster the decompression will usually be. +This also means that the amount of uncompressed output +produced per second can vary a lot. +.IP "" +The following table summarises the features of the presets: +.RS +.RS +.PP +.TS +tab(;); +c c c c c +n n n n n. +Preset;DictSize;CompCPU;CompMem;DecMem +\-0;256 KiB;0;3 MiB;1 MiB +\-1;1 MiB;1;9 MiB;2 MiB +\-2;2 MiB;2;17 MiB;3 MiB +\-3;4 MiB;3;32 MiB;5 MiB +\-4;4 MiB;4;48 MiB;5 MiB +\-5;8 MiB;5;94 MiB;9 MiB +\-6;8 MiB;6;94 MiB;9 MiB +\-7;16 MiB;6;186 MiB;17 MiB +\-8;32 MiB;6;370 MiB;33 MiB +\-9;64 MiB;6;674 MiB;65 MiB +.TE +.RE .RE -.IP -The exact compression settings (filter chain) used by each preset may -vary between +.IP "" +Column descriptions: +.RS +.IP \(bu 3 +DictSize is the LZMA2 dictionary size. +It is waste of memory to use a dictionary bigger than +the size of the uncompressed file. +This is why it is good to avoid using the presets +.BR \-7 " ... " \-9 +when there's no real need for them. +At +.B \-6 +and lower, the amount of memory wasted is +usually low enough to not matter. +.IP \(bu 3 +CompCPU is a simplified representation of the LZMA2 settings +that affect compression speed. +The dictionary size affects speed too, +so while CompCPU is the same for levels +.BR \-6 " ... " \-9 , +higher levels still tend to be a little slower. +To get even slower and thus possibly better compression, see +.BR \-\-extreme . +.IP \(bu 3 +CompMem contains the compressor memory requirements +in the single-threaded mode. +It may vary slightly between .B xz -versions. Because the settings may vary, the memory usage may vary -slightly too. FIXME The following -table lists the maximum memory usage of each preset level, which won't be -exceeded even in future versions of -.BR xz . -.IP -.B "FIXME: The table below is just a rough idea." +versions. +Memory requirements of some of the future multithreaded modes may +be dramatically higher than that of the single-threaded mode. +.IP \(bu 3 +DecMem contains the decompressor memory requirements. +That is, the compression settings determine +the memory requirements of the decompressor. +The exact decompressor memory usage is slighly more than +the LZMA2 dictionary size, but the values in the table +have been rounded up to the next full MiB. +.RE +.TP +.BR \-e ", " \-\-extreme +Use a slower variant of the selected compression preset level +.RB ( \-0 " ... " \-9 ) +to hopefully get a little bit better compression ratio, +but with bad luck this can also make it worse. +Decompressor memory usage is not affected, +but compressor memory usage increases a little at preset levels +.BR \-0 " ... " \-3 . +.IP "" +Since there are two presets with dictionary sizes +4\ MiB and 8\ MiB, the presets +.B \-3e +and +.B \-5e +use slightly faster settings (lower CompCPU) than +.B \-4e +and +.BR \-6e , +respectively. +That way no two presets are identical. .RS .RS +.PP .TS tab(;); -c c c -n n n. -Preset;Compression;Decompression -\-0;6 MiB;1 MiB -\-1;6 MiB;1 MiB -\-2;10 MiB;1 MiB -\-3;20 MiB;2 MiB -\-4;30 MiB;3 MiB -\-5;60 MiB;6 MiB -\-6;100 MiB;10 MiB -\-7;200 MiB;20 MiB -\-8;400 MiB;40 MiB -\-9;800 MiB;80 MiB +c c c c c +n n n n n. +Preset;DictSize;CompCPU;CompMem;DecMem +\-0e;256 KiB;8;4 MiB;1 MiB +\-1e;1 MiB;8;13 MiB;2 MiB +\-2e;2 MiB;8;25 MiB;3 MiB +\-3e;4 MiB;7;48 MiB;5 MiB +\-4e;4 MiB;8;48 MiB;5 MiB +\-5e;8 MiB;7;94 MiB;9 MiB +\-6e;8 MiB;8;94 MiB;9 MiB +\-7e;16 MiB;8;186 MiB;17 MiB +\-8e;32 MiB;8;370 MiB;33 MiB +\-9e;64 MiB;8;674 MiB;65 MiB .TE .RE .RE +.IP "" +For example, there are a total of four presets that use +8\ MiB dictionary, whose order from the fastest to the slowest is +.BR \-5 , +.BR \-6 , +.BR \-5e , +and +.BR \-6e . .TP -.BR \-\-fast " and " \-\-best +.B \-\-fast +.PD 0 +.TP +.B \-\-best +.PD These are somewhat misleading aliases for .B \-0 and .BR \-9 , respectively. -These are provided only for backwards compatibility with LZMA Utils. +These are provided only for backwards compatibility +with LZMA Utils. Avoid using these options. -.IP -Especially the name of -.B \-\-best -is misleading, because the definition of best depends on the input data, -and that usually people don't want the very best compression ratio anyway, -because it would be very slow. -.TP -.BR \-e ", " \-\-extreme -Modify the compression preset (\fB\-0\fR ... \fB\-9\fR) so that a little bit -better compression ratio can be achieved without increasing memory usage -of the compressor or decompressor (exception: compressor memory usage may -increase a little with presets \fB\-0\fR ... \fB\-2\fR). The downside is that -the compression time will increase dramatically (it can easily double). .TP .BI \-\-memlimit\-compress= limit -Set a memory usage limit for compression. If this option is specified -multiple times, the last one takes effect. -.IP +Set a memory usage limit for compression. +If this option is specified multiple times, +the last one takes effect. +.IP "" If the compression settings exceed the .IR limit , .B xz -will adjust the settings downwards so that the limit is no longer exceeded -and display a notice that automatic adjustment was done. Adjustment is never -done when compressing with +will adjust the settings downwards so that +the limit is no longer exceeded and display a notice that +automatic adjustment was done. +Adjustment is never done when compressing with .B \-\-format=raw or if .B \-\-no\-adjust -has been specified. In those cases, an error is displayed and +has been specified. +In those cases, an error is displayed and .B xz -will exit with exit status -.BR 1 . -.IP +will exit with exit status 1. +.IP "" The .I limit can be specified in multiple ways: @@ -638,9 +801,11 @@ can be specified in multiple ways: .IP \(bu 3 The .I limit -can be an absolute value in bytes. Using an integer suffix like +can be an absolute value in bytes. +Using an integer suffix like .B MiB -can be useful. Example: +can be useful. +Example: .B "\-\-memlimit\-compress=80MiB" .IP \(bu 3 The @@ -648,9 +813,11 @@ The can be specified as a percentage of total physical memory (RAM). This can be useful especially when setting the .B XZ_DEFAULTS -environment variable in a shell initialization script that is shared -between different computers. That way the limit is automatically bigger -on systems with more memory. Example: +environment variable in a shell initialization script +that is shared between different computers. +That way the limit is automatically bigger +on systems with more memory. +Example: .B "\-\-memlimit\-compress=70%" .IP \(bu 3 The @@ -661,7 +828,8 @@ This is currently equivalent to setting the .I limit to .B max -i.e. no memory usage limit. Once multithreading support has been implemented, +i.e. no memory usage limit. +Once multithreading support has been implemented, there may be a difference between .B 0 and @@ -670,19 +838,22 @@ for the multithreaded case, so it is recommended to use .B 0 instead of .B max -at least until the details have been decided. +until the details have been decided. .RE -.IP +.IP "" See also the section .BR "Memory usage" . .TP .BI \-\-memlimit\-decompress= limit -Set a memory usage limit for decompression. This affects also the +Set a memory usage limit for decompression. +This affects also the .B \-\-list -mode. If the operation is not possible without exceeding the +mode. +If the operation is not possible without exceeding the .IR limit , .B xz -will display an error and decompressing the file will fail. See +will display an error and decompressing the file will fail. +See .BI \-\-memlimit\-compress= limit for possible ways to specify the .IR limit . @@ -693,72 +864,94 @@ This is equivalent to specifying \fB\-\-memlimit\-compress=\fIlimit .TP .B \-\-no\-adjust Display an error and exit if the compression settings exceed the -the memory usage limit. The default is to adjust the settings downwards so -that the memory usage limit is not exceeded. Automatic adjusting is -always disabled when creating raw streams +the memory usage limit. +The default is to adjust the settings downwards so +that the memory usage limit is not exceeded. +Automatic adjusting is always disabled when creating raw streams .RB ( \-\-format=raw ). .TP \fB\-T\fR \fIthreads\fR, \fB\-\-threads=\fIthreads -Specify the number of worker threads to use. The actual number of threads -can be less than +Specify the number of worker threads to use. +The actual number of threads can be less than .I threads if using more threads would exceed the memory usage limit. -.IP -.B "Multithreaded compression and decompression are not implemented yet," -.B "so this option has no effect for now." -.IP -.B "As of writing (2010-08-07), it hasn't been decided if threads will be" -.B "used by default on multicore systems once support for threading has" -.B "been implemented. Comments are welcome." -The complicating factor is that using many threads will increase the memory -usage dramatically. Note that if multithreading will be the default, -it will be done so that single-threaded and multithreaded modes produce -the same output, so compression ratio won't be significantly affected if -threading will be enabled by default. -.SS Custom compressor filter chains -A custom filter chain allows specifying the compression settings in detail -instead of relying on the settings associated to the preset levels. -When a custom filter chain is specified, the compression preset level options -(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are silently ignored. -.PP -A filter chain is comparable to piping on the UN*X command line. -When compressing, the uncompressed input goes to the first filter, whose -output goes to the next filter (if any). The output of the last filter -gets written to the compressed file. The maximum number of filters in -the chain is four, but typically a filter chain has only one or two filters. -.PP -Many filters have limitations where they can be in the filter chain: -some filters can work only as the last filter in the chain, some only -as a non-last filter, and some work in any position in the chain. Depending -on the filter, this limitation is either inherent to the filter design or -exists to prevent security issues. -.PP -A custom filter chain is specified by using one or more filter options in -the order they are wanted in the filter chain. That is, the order of filter -options is significant! When decoding raw streams +.IP "" +.B "Multithreaded compression and decompression are not" +.B "implemented yet, so this option has no effect for now." +.IP "" +.B "As of writing (2010-09-27), it hasn't been decided" +.B "if threads will be used by default on multicore systems" +.B "once support for threading has been implemented." +.B "Comments are welcome." +The complicating factor is that using many threads +will increase the memory usage dramatically. +Note that if multithreading will be the default, +it will probably be done so that single-threaded and +multithreaded modes produce the same output, +so compression ratio won't be significantly affected +if threading will be enabled by default. +. +.SS "Custom compressor filter chains" +A custom filter chain allows specifying +the compression settings in detail instead of relying on +the settings associated to the preset levels. +When a custom filter chain is specified, +the compression preset level options +(\fB\-0\fR ... \fB\-9\fR and \fB\-\-extreme\fR) are +silently ignored. +.PP +A filter chain is comparable to piping on the command line. +When compressing, the uncompressed input goes to the first filter, +whose output goes to the next filter (if any). +The output of the last filter gets written to the compressed file. +The maximum number of filters in the chain is four, +but typically a filter chain has only one or two filters. +.PP +Many filters have limitations where they can be +in the filter chain: +some filters can work only as the last filter in the chain, +some only as a non-last filter, and some work in any position +in the chain. +Depending on the filter, this limitation is either inherent to +the filter design or exists to prevent security issues. +.PP +A custom filter chain is specified by using one or more +filter options in the order they are wanted in the filter chain. +That is, the order of filter options is significant! +When decoding raw streams .RB ( \-\-format=raw ), -the filter chain is specified in the same order as it was specified when -compressing. +the filter chain is specified in the same order as +it was specified when compressing. .PP Filters take filter-specific .I options -as a comma-separated list. Extra commas in +as a comma-separated list. +Extra commas in .I options -are ignored. Every option has a default value, so you need to +are ignored. +Every option has a default value, so you need to specify only those you want to change. .TP -\fB\-\-lzma1\fR[\fB=\fIoptions\fR], \fB\-\-lzma2\fR[\fB=\fIoptions\fR] -Add LZMA1 or LZMA2 filter to the filter chain. These filter can be used -only as the last filter in the chain. -.IP -LZMA1 is a legacy filter, which is supported almost solely due to the legacy +\fB\-\-lzma1\fR[\fB=\fIoptions\fR] +.PD 0 +.TP +\fB\-\-lzma2\fR[\fB=\fIoptions\fR] +.PD +Add LZMA1 or LZMA2 filter to the filter chain. +These filters can be used only as the last filter in the chain. +.IP "" +LZMA1 is a legacy filter, +which is supported almost solely due to the legacy .B .lzma -file format, which supports only LZMA1. LZMA2 is an updated -version of LZMA1 to fix some practical issues of LZMA1. The +file format, which supports only LZMA1. +LZMA2 is an updated +version of LZMA1 to fix some practical issues of LZMA1. +The .B .xz -format uses LZMA2, and doesn't support LZMA1 at all. Compression speed and -ratios of LZMA1 and LZMA2 are practically the same. -.IP +format uses LZMA2 and doesn't support LZMA1 at all. +Compression speed and ratios of LZMA1 and LZMA2 +are practically the same. +.IP "" LZMA1 and LZMA2 share the same set of .IR options : .RS @@ -769,8 +962,9 @@ Reset all LZMA1 or LZMA2 to .IR preset . .I Preset -consist of an integer, which may be followed by single-letter preset -modifiers. The integer can be from +consist of an integer, which may be followed by single-letter +preset modifiers. +The integer can be from .B 0 to .BR 9 , @@ -779,7 +973,6 @@ The only supported modifier is currently .BR e , which matches .BR \-\-extreme . -.IP The default .I preset is @@ -789,84 +982,155 @@ from which the default values for the rest of the LZMA1 or LZMA2 are taken. .TP .BI dict= size -Dictionary (history buffer) size indicates how many bytes of the recently -processed uncompressed data is kept in memory. One method to reduce size of -the uncompressed data is to store distance-length pairs, which -indicate what data to repeat from the dictionary buffer. The bigger -the dictionary, the better the compression ratio usually is, -but dictionaries bigger than the uncompressed data are waste of RAM. -.IP -Typical dictionary size is from 64 KiB to 64 MiB. The minimum is 4 KiB. -The maximum for compression is currently 1.5 GiB. The decompressor already -supports dictionaries up to one byte less than 4 GiB, which is the -maximum for LZMA1 and LZMA2 stream formats. -.IP -Dictionary size has the biggest effect on compression ratio. -Dictionary size and match finder together determine the memory usage of -the LZMA1 or LZMA2 encoder. The same dictionary size is required -for decompressing that was used when compressing, thus the memory usage of -the decoder is determined by the dictionary size used when compressing. +Dictionary (history buffer) +.I size +indicates how many bytes of the recently processed +uncompressed data is kept in memory. +The algorithm tries to find repeating byte sequences (matches) in +the uncompressed data, and replace them with references +to the data currently in the dictionary. +The bigger the dictionary, the higher is the chance +to find a match. +Thus, increasing dictionary +.I size +usually improves compression ratio, but +a dictionary bigger than the uncompressed file is waste of memory. +.IP "" +Typical dictionary +.I size +is from 64\ KiB to 64\ MiB. +The minimum is 4\ KiB. +The maximum for compression is currently 1.5\ GiB (1536\ MiB). +The decompressor already supports dictionaries up to +one byte less than 4\ GiB, which is the maximum for +the LZMA1 and LZMA2 stream formats. +.IP "" +Dictionary +.I size +and match finder +.RI ( mf ) +together determine the memory usage of the LZMA1 or LZMA2 encoder. +The same (or bigger) dictionary +.I size +is required for decompressing that was used when compressing, +thus the memory usage of the decoder is determined +by the dictionary size used when compressing. +The +.B .xz +headers store the dictionary +.I size +either as +.RI "2^" n +or +.RI "2^" n " + 2^(" n "\-1)," +so these +.I sizes +are somewhat preferred for compression. +Other +.I sizes +will get rounded up when stored in the +.B .xz +headers. .TP .BI lc= lc -Specify the number of literal context bits. The minimum is -.B 0 -and the maximum is -.BR 4 ; -the default is -.BR 3 . +Specify the number of literal context bits. +The minimum is 0 and the maximum is 4; the default is 3. In addition, the sum of .I lc and .I lp -must not exceed -.BR 4 . +must not exceed 4. +.IP "" +All bytes that cannot be encoded as matches +are encoded as literals. +That is, literals are simply 8-bit bytes +that are encoded one at a time. +.IP "" +The literal coding makes an assumption that the highest +.I lc +bits of the previous uncompressed byte correlate +with the next byte. +E.g. in typical English text, an upper-case letter is +often followed by a lower-case letter, and a lower-case +letter is usually followed by another lower-case letter. +In the US-ASCII character set, the highest three bits are 010 +for upper-case letters and 011 for lower-case letters. +When +.I lc +is at least 3, the literal coding can take advantage of +this property in the uncompressed data. +.IP "" +The default value (3) is usually good. +If you want maximum compression, test +.BR lc=4 . +Sometimes it helps a little, and +sometimes it makes compression worse. +If it makes it worse, test e.g.\& +.B lc=2 +too. .TP .BI lp= lp -Specify the number of literal position bits. The minimum is -.B 0 -and the maximum is -.BR 4 ; -the default is -.BR 0 . +Specify the number of literal position bits. +The minimum is 0 and the maximum is 4; the default is 0. +.IP "" +.I Lp +affects what kind of alignment in the uncompressed data is +assumed when encoding literals. +See +.I pb +below for more information about alignment. .TP .BI pb= pb -Specify the number of position bits. The minimum is -.B 0 -and the maximum is -.BR 4 ; -the default is -.BR 2 . -.TP -.BI mode= mode -Compression -.I mode -specifies the function used to analyze the data produced by the match finder. -Supported -.I modes -are -.B fast -and -.BR normal . -The default is -.B fast -for -.I presets -.BR 0 \- 2 +Specify the number of position bits. +The minimum is 0 and the maximum is 4; the default is 2. +.IP "" +.I Pb +affects what kind of alignment in the uncompressed data is +assumed in general. +The default means four-byte alignment +.RI (2^ pb =2^2=4), +which is often a good choice when there's no better guess. +.IP "" +When the aligment is known, setting +.I pb +accordingly may reduce the file size a little. +E.g. with text files having one-byte +alignment (US-ASCII, ISO-8859-*, UTF-8), setting +.B pb=0 +can improve compression slightly. +For UTF-16 text, +.B pb=1 +is a good choice. +If the alignment is an odd number like 3 bytes, +.B pb=0 +might be the best choice. +.IP "" +Even though the assumed alignment can be adjusted with +.I pb and -.B normal -for -.I presets -.BR 3 \- 9 . +.IR lp , +LZMA1 and LZMA2 still slightly favor 16-byte alignment. +It might be worth taking into account when designing file formats +that are likely to be often compressed with LZMA1 or LZMA2. .TP .BI mf= mf -Match finder has a major effect on encoder speed, memory usage, and -compression ratio. Usually Hash Chain match finders are faster than -Binary Tree match finders. Hash Chains are usually used together with -.B mode=fast -and Binary Trees with -.BR mode=normal . -The memory usage formulas are only rough estimates, -which are closest to reality when +Match finder has a major effect on encoder speed, +memory usage, and compression ratio. +Usually Hash Chain match finders are faster than Binary Tree +match finders. +The default depends on the +.IR preset : +0 uses +.BR hc3 , +1\-3 +use +.BR hc4 , +and the rest use +.BR bt4 . +.IP "" +The following match finders are supported. +The memory usage formulas below are rough approximations, +which are closest to the reality when .I dict is a power of two. .RS @@ -879,6 +1143,7 @@ Minimum value for 3 .br Memory usage: +.br .I dict * 7.5 (if .I dict @@ -897,8 +1162,16 @@ Minimum value for 4 .br Memory usage: +.br +.I dict +* 7.5 (if +.I dict +<= 32 MiB); +.br +.I dict +* 6.5 (if .I dict -* 7.5 +> 32 MiB) .TP .B bt2 Binary Tree with 2-byte hashing @@ -919,6 +1192,7 @@ Minimum value for 3 .br Memory usage: +.br .I dict * 11.5 (if .I dict @@ -937,53 +1211,96 @@ Minimum value for 4 .br Memory usage: +.br +.I dict +* 11.5 (if .I dict -* 11.5 +<= 32 MiB); +.br +.I dict +* 10.5 (if +.I dict +> 32 MiB) .RE .TP +.BI mode= mode +Compression +.I mode +specifies the method to analyze +the data produced by the match finder. +Supported +.I modes +are +.B fast +and +.BR normal . +The default is +.B fast +for +.I presets +0\-3 and +.B normal +for +.I presets +4\-9. +.IP "" +Usually +.B fast +is used with Hash Chain match finders and +.B normal +with Binary Tree match finders. +This is also what the +.I presets +do. +.TP .BI nice= nice -Specify what is considered to be a nice length for a match. Once a match -of at least +Specify what is considered to be a nice length for a match. +Once a match of at least .I nice -bytes is found, the algorithm stops looking for possibly better matches. -.IP -.I nice -can be 2\-273 bytes. Higher values tend to give better compression ratio -at expense of speed. The default depends on the -.I preset -level. +bytes is found, the algorithm stops +looking for possibly better matches. +.IP "" +.I Nice +can be 2\-273 bytes. +Higher values tend to give better compression ratio +at the expense of speed. +The default depends on the +.IR preset . .TP .BI depth= depth -Specify the maximum search depth in the match finder. The default is the -special value -.BR 0 , +Specify the maximum search depth in the match finder. +The default is the special value of 0, which makes the compressor determine a reasonable .I depth from .I mf and .IR nice . -.IP +.IP "" +Reasonable +.I depth +for Hash Chains is 4\-100 and 16\-1000 for Binary Trees. Using very high values for .I depth -can make the encoder extremely slow with carefully crafted files. +can make the encoder extremely slow with some files. Avoid setting the .I depth -over 1000 unless you are prepared to interrupt the compression in case it -is taking too long. +over 1000 unless you are prepared to interrupt +the compression in case it is taking far too long. .RE -.IP +.IP "" When decoding raw streams .RB ( \-\-format=raw ), -LZMA2 needs only the value of -.BR dict . +LZMA2 needs only the dictionary +.IR size . LZMA1 needs also -.BR lc , -.BR lp , +.IR lc , +.IR lp , and -.BR pb. +.IR pb . .TP \fB\-\-x86\fR[\fB=\fIoptions\fR] +.PD 0 .TP \fB\-\-powerpc\fR[\fB=\fIoptions\fR] .TP @@ -994,28 +1311,72 @@ and \fB\-\-armthumb\fR[\fB=\fIoptions\fR] .TP \fB\-\-sparc\fR[\fB=\fIoptions\fR] -Add a branch/call/jump (BCJ) filter to the filter chain. These filters -can be used only as non-last filter in the filter chain. -.IP -A BCJ filter converts relative addresses in the machine code to their -absolute counterparts. This doesn't change the size of the data, but -it increases redundancy, which allows e.g. LZMA2 to get better -compression ratio. -.IP -The BCJ filters are always reversible, so using a BCJ filter for wrong -type of data doesn't cause any data loss. However, applying a BCJ filter -for wrong type of data is a bad idea, because it tends to make the -compression ratio worse. -.IP +.PD +Add a branch/call/jump (BCJ) filter to the filter chain. +These filters can be used only as a non-last filter +in the filter chain. +.IP "" +A BCJ filter converts relative addresses in +the machine code to their absolute counterparts. +This doesn't change the size of the data, +but it increases redundancy, +which can help LZMA2 to produce 0\-15\ % smaller +.B .xz +file. +The BCJ filters are always reversible, +so using a BCJ filter for wrong type of data +doesn't cause any data loss, although it may make +the compression ratio slightly worse. +.IP "" +It is fine to apply a BCJ filter on a whole executable; +there's no need to apply it only on the executable section. +Applying a BCJ filter on an archive that contains both executable +and non-executable files may or may not give good results, +so it generally isn't good to blindly apply a BCJ filter when +compressing binary packages for distribution. +.IP "" +These BCJ filters are very fast and +use insignificant amount of memory. +If a BCJ filter improves compression ratio of a file, +it can improve decompression speed at the same time. +This is because, on the same hardware, +the decompression speed of LZMA2 is roughly +a fixed number of bytes of compressed data per second. +.IP "" +These BCJ filters have known problems related to +the compression ratio: +.RS +.IP \(bu 3 +Some types of files containing executable code +(e.g. object files, static libraries, and Linux kernel modules) +have the addresses in the instructions filled with filler values. +These BCJ filters will still do the address conversion, +which will make the compression worse with these files. +.IP \(bu 3 +Applying a BCJ filter on an archive containing multiple similar +executables can make the compression ratio worse than not using +a BCJ filter. +This is because the BCJ filter doesn't detect the boundaries +of the executable files, and doesn't reset +the address conversion counter for each executable. +.RE +.IP "" +Both of the above problems will be fixed +in the future in a new filter. +The old BCJ filters will still be useful in embedded systems, +because the decoder of the new filter will be bigger +and use more memory. +.IP "" Different instruction sets have have different alignment: .RS .RS +.PP .TS tab(;); l n l l n l. Filter;Alignment;Notes -x86;1;32-bit and 64-bit x86 +x86;1;32-bit or 64-bit x86 PowerPC;4;Big endian only ARM;4;Little endian only ARM-Thumb;2;Little endian only @@ -1024,15 +1385,18 @@ SPARC;4;Big or little endian .TE .RE .RE -.IP -Since the BCJ-filtered data is usually compressed with LZMA2, the compression -ratio may be improved slightly if the LZMA2 options are set to match the -alignment of the selected BCJ filter. For example, with the IA-64 filter, -it's good to set +.IP "" +Since the BCJ-filtered data is usually compressed with LZMA2, +the compression ratio may be improved slightly if +the LZMA2 options are set to match the +alignment of the selected BCJ filter. +For example, with the IA-64 filter, it's good to set .B pb=4 -with LZMA2 (2^4=16). The x86 filter is an exception; it's usually good to -stick to LZMA2's default four-byte alignment when compressing x86 executables. -.IP +with LZMA2 (2^4=16). +The x86 filter is an exception; +it's usually good to stick to LZMA2's default +four-byte alignment when compressing x86 executables. +.IP "" All BCJ filters support the same .IR options : .RS @@ -1040,37 +1404,32 @@ All BCJ filters support the same .BI start= offset Specify the start .I offset -that is used when converting between relative and absolute addresses. +that is used when converting between relative +and absolute addresses. The .I offset -must be a multiple of the alignment of the filter (see the table above). -The default is zero. In practice, the default is good; specifying -a custom +must be a multiple of the alignment of the filter +(see the table above). +The default is zero. +In practice, the default is good; specifying a custom .I offset is almost never useful. -.IP -Specifying a non-zero start -.I offset -is probably useful only if the executable has multiple sections, and there -are many cross-section jumps or calls. Applying a BCJ filter separately for -each section with proper start offset and then compressing the result as -a single chunk may give some improvement in compression ratio compared -to applying the BCJ filter with the default -.I offset -for the whole executable. .RE .TP \fB\-\-delta\fR[\fB=\fIoptions\fR] -Add Delta filter to the filter chain. The Delta filter -can be used only as non-last filter in the filter chain. -.IP -Currently only simple byte-wise delta calculation is supported. It can -be useful when compressing e.g. uncompressed bitmap images or uncompressed -PCM audio. However, special purpose algorithms may give significantly better -results than Delta + LZMA2. This is true especially with audio, which -compresses faster and better e.g. with +Add Delta filter to the filter chain. +The Delta filter can be used only as non-last filter +in the filter chain. +.IP "" +Currently only simple byte-wise delta calculation is supported. +It can be useful when compressing e.g. uncompressed bitmap images +or uncompressed PCM audio. +However, special purpose algorithms may give significantly better +results than Delta + LZMA2. +This is true especially with audio, +which compresses faster and better e.g. with .BR flac (1). -.IP +.IP "" Supported .IR options : .RS @@ -1078,89 +1437,103 @@ Supported .BI dist= distance Specify the .I distance -of the delta calculation as bytes. +of the delta calculation in bytes. .I distance -must be 1\-256. The default is 1. -.IP +must be 1\-256. +The default is 1. +.IP "" For example, with .B dist=2 and eight-byte input A1 B1 A2 B3 A3 B5 A4 B7, the output will be A1 B1 01 02 01 02 01 02. .RE +. .SS "Other options" .TP .BR \-q ", " \-\-quiet -Suppress warnings and notices. Specify this twice to suppress errors too. -This option has no effect on the exit status. That is, even if a warning -was suppressed, the exit status to indicate a warning is still used. +Suppress warnings and notices. +Specify this twice to suppress errors too. +This option has no effect on the exit status. +That is, even if a warning was suppressed, +the exit status to indicate a warning is still used. .TP .BR \-v ", " \-\-verbose -Be verbose. If standard error is connected to a terminal, +Be verbose. +If standard error is connected to a terminal, .B xz will display a progress indicator. Specifying .B \-\-verbose -twice will give even more verbose output (useful mostly for debugging). -.IP +twice will give even more verbose output. +.IP "" The progress indicator shows the following information: .RS .IP \(bu 3 -Completion percentage is shown if the size of the input file is known. +Completion percentage is shown +if the size of the input file is known. That is, percentage cannot be shown in pipes. .IP \(bu 3 -Amount of compressed data produced (compressing) or consumed (decompressing). +Amount of compressed data produced (compressing) +or consumed (decompressing). .IP \(bu 3 -Amount of uncompressed data consumed (compressing) or produced -(decompressing). +Amount of uncompressed data consumed (compressing) +or produced (decompressing). .IP \(bu 3 -Compression ratio, which is calculated by dividing the amount of -compressed data processed so far by the amount of uncompressed data -processed so far. +Compression ratio, which is calculated by dividing +the amount of compressed data processed so far by +the amount of uncompressed data processed so far. .IP \(bu 3 -Compression or decompression speed. This is measured as the amount of -uncompressed data consumed (compression) or produced (decompression) -per second. It is shown after a few seconds have passed since +Compression or decompression speed. +This is measured as the amount of uncompressed data consumed +(compression) or produced (decompression) per second. +It is shown after a few seconds have passed since .B xz started processing the file. .IP \(bu 3 Elapsed time in the format M:SS or H:MM:SS. .IP \(bu 3 -Estimated remaining time is shown only when the size of the input file is +Estimated remaining time is shown +only when the size of the input file is known and a couple of seconds have already passed since .B xz -started processing the file. The time is shown in a less precise format which +started processing the file. +The time is shown in a less precise format which never has any colons, e.g. 2 min 30 s. .RE -.IP +.IP "" When standard error is not a terminal, .B \-\-verbose will make .B xz -print the filename, compressed size, uncompressed size, compression ratio, -and possibly also the speed and elapsed time on a single line to standard -error after compressing or decompressing the file. The speed and elapsed -time are included only when the operation took at least a few seconds. -If the operation didn't finish, for example due to user interruption, also -the completion percentage is printed if the size of the input file is known. +print the filename, compressed size, uncompressed size, +compression ratio, and possibly also the speed and elapsed time +on a single line to standard error after compressing or +decompressing the file. +The speed and elapsed time are included only when +the operation took at least a few seconds. +If the operation didn't finish, e.g. due to user interruption, +also the completion percentage is printed +if the size of the input file is known. .TP .BR \-Q ", " \-\-no\-warn -Don't set the exit status to -.B 2 -even if a condition worth a warning was detected. This option doesn't affect -the verbosity level, thus both +Don't set the exit status to 2 +even if a condition worth a warning was detected. +This option doesn't affect the verbosity level, thus both .B \-\-quiet and .B \-\-no\-warn -have to be used to not display warnings and to not alter the exit status. +have to be used to not display warnings and +to not alter the exit status. .TP .B \-\-robot -Print messages in a machine-parsable format. This is intended to ease -writing frontends that want to use +Print messages in a machine-parsable format. +This is intended to ease writing frontends that want to use .B xz -instead of liblzma, which may be the case with various scripts. The output -with this option enabled is meant to be stable across +instead of liblzma, which may be the case with various scripts. +The output with this option enabled is meant to be stable across .B xz -releases. See the section +releases. +See the section .B "ROBOT MODE" for details. .TP @@ -1182,24 +1555,29 @@ and exit successfully .BR \-V ", " \-\-version Display the version number of .B xz -and liblzma in human readable format. To get machine-parsable output, specify +and liblzma in human readable format. +To get machine-parsable output, specify .B \-\-robot before .BR \-\-version . -.SH ROBOT MODE +. +.SH "ROBOT MODE" The robot mode is activated with the .B \-\-robot -option. It makes the output of +option. +It makes the output of .B xz -easier to parse by other programs. Currently +easier to parse by other programs. +Currently .B \-\-robot is supported only together with .BR \-\-version , .BR \-\-info\-memory , and .BR \-\-list . -It will be supported for normal compression and decompression in the future. -.PP +It will be supported for normal compression and +decompression in the future. +. .SS Version .B "xz \-\-robot \-\-version" will print the version number of @@ -1214,24 +1592,19 @@ and liblzma in the following format: Major version. .TP .I YYY -Minor version. Even numbers are stable. +Minor version. +Even numbers are stable. Odd numbers are alpha or beta versions. .TP .I ZZZ -Patch level for stable releases or just a counter for development releases. +Patch level for stable releases or +just a counter for development releases. .TP .I S Stability. -.B 0 -is alpha, -.B 1 -is beta, and -.B 2 -is stable. +0 is alpha, 1 is beta, and 2 is stable. .I S -should be always -.B 2 -when +should be always 2 when .I YYY is even. .PP @@ -1245,45 +1618,48 @@ Examples: 4.999.9beta is and 5.0.0 is .BR 50000002 . -.SS Memory limit information +. +.SS "Memory limit information" .B "xz \-\-robot \-\-info\-memory" prints a single line with three tab-separated columns: -.RS .IP 1. 4 -Total amount of physical memory (RAM) as bytes +Total amount of physical memory (RAM) in bytes .IP 2. 4 -Memory usage limit for compression as bytes. +Memory usage limit for compression in bytes. A special value of zero indicates the default setting, which for single-threaded mode is the same as no limit. .IP 3. 4 -Memory usage limit for decompression as bytes. +Memory usage limit for decompression in bytes. A special value of zero indicates the default setting, which for single-threaded mode is the same as no limit. -.RE .PP In the future, the output of .B "xz \-\-robot \-\-info\-memory" may have more columns, but never more than a single line. -.SS List mode +. +.SS "List mode" .B "xz \-\-robot \-\-list" -uses tab-separated output. The first column of every line has a string +uses tab-separated output. +The first column of every line has a string that indicates the type of the information found on that line: .TP .B name -This is always the first line when starting to list a file. The second -column on the line is the filename. +This is always the first line when starting to list a file. +The second column on the line is the filename. .TP .B file This line contains overall information about the .B .xz -file. This line is always printed after the +file. +This line is always printed after the .B name line. .TP .B stream This line type is used only when .B \-\-verbose -was specified. There are as many +was specified. +There are as many .B stream lines as there are streams in the .B .xz @@ -1292,11 +1668,13 @@ file. .B block This line type is used only when .B \-\-verbose -was specified. There are as many +was specified. +There are as many .B block lines as there are blocks in the .B .xz -file. The +file. +The .B block lines are shown after all the .B stream @@ -1305,9 +1683,11 @@ lines; different line types are not interleaved. .B summary This line type is used only when .B \-\-verbose -was specified twice. This line is printed after all +was specified twice. +This line is printed after all .B block -lines. Like the +lines. +Like the .B file line, the .B summary @@ -1316,12 +1696,13 @@ line contains overall information about the file. .TP .B totals -This line is always the very last line of the list output. It shows -the total counts and sizes. +This line is always the very last line of the list output. +It shows the total counts and sizes. .PP The columns of the .B file lines: +.PD 0 .RS .IP 2. 4 Number of streams in the file @@ -1338,8 +1719,8 @@ If ratio is over 9.999, three dashes .RB ( \-\-\- ) are displayed instead of the ratio. .IP 7. 4 -Comma-separated list of integrity check names. The following strings are -used for the known check types: +Comma-separated list of integrity check names. +The following strings are used for the known check types: .BR None , .BR CRC32 , .BR CRC64 , @@ -1353,10 +1734,12 @@ is the Check ID as a decimal number (one or two digits). .IP 8. 4 Total size of stream padding in the file .RE +.PD .PP The columns of the .B stream lines: +.PD 0 .RS .IP 2. 4 Stream number (the first stream is 1) @@ -1377,15 +1760,18 @@ Name of the integrity check .IP 10. 4 Size of stream padding .RE +.PD .PP The columns of the .B block lines: +.PD 0 .RS .IP 2. 4 Number of the stream containing this block .IP 3. 4 -Block number relative to the beginning of the stream (the first block is 1) +Block number relative to the beginning of the stream +(the first block is 1) .IP 4. 4 Block number relative to the beginning of the file .IP 5. 4 @@ -1401,14 +1787,18 @@ Compression ratio .IP 10. 4 Name of the integrity check .RE +.PD .PP If .B \-\-verbose was specified twice, additional columns are included on the .B block -lines. These are not displayed with a single +lines. +These are not displayed with a single .BR \-\-verbose , -because getting this information requires many seeks and can thus be slow: +because getting this information requires many seeks +and can thus be slow: +.PD 0 .RS .IP 11. 4 Value of the integrity check in hexadecimal @@ -1422,26 +1812,30 @@ indicates that compressed size is present, and indicates that uncompressed size is present. If the flag is not set, a dash .RB ( \- ) -is shown instead to keep the string length fixed. New flags may be added -to the end of the string in the future. +is shown instead to keep the string length fixed. +New flags may be added to the end of the string in the future. .IP 14. 4 Size of the actual compressed data in the block (this excludes the block header, block padding, and check fields) .IP 15. 4 -Amount of memory (as bytes) required to decompress this block with this +Amount of memory (in bytes) required to decompress +this block with this .B xz version .IP 16. 4 -Filter chain. Note that most of the options used at compression time cannot -be known, because only the options that are needed for decompression are -stored in the +Filter chain. +Note that most of the options used at compression time +cannot be known, because only the options +that are needed for decompression are stored in the .B .xz headers. .RE +.PD .PP The columns of the .B totals line: +.PD 0 .RS .IP 2. 4 Number of streams @@ -1454,14 +1848,17 @@ Uncompressed size .IP 6. 4 Average compression ratio .IP 7. 4 -Comma-separated list of integrity check names that were present in the files +Comma-separated list of integrity check names +that were present in the files .IP 8. 4 Stream padding size .IP 9. 4 -Number of files. This is here to keep the order of the earlier columns -the same as on +Number of files. +This is here to +keep the order of the earlier columns the same as on .B file lines. +.PD .RE .PP If @@ -1469,10 +1866,11 @@ If was specified twice, additional columns are included on the .B totals line: +.PD 0 .RS .IP 10. 4 -Maximum amount of memory (as bytes) required to decompress the files -with this +Maximum amount of memory (in bytes) required to decompress +the files with this .B xz version .IP 11. 4 @@ -1482,9 +1880,12 @@ or indicating if all block headers have both compressed size and uncompressed size stored in them .RE +.PD .PP -Future versions may add new line types and new columns can be added to -the existing line types, but the existing columns won't be changed. +Future versions may add new line types and +new columns can be added to the existing line types, +but the existing columns won't be changed. +. .SH "EXIT STATUS" .TP .B 0 @@ -1494,19 +1895,23 @@ All is good. An error occurred. .TP .B 2 -Something worth a warning occurred, but no actual errors occurred. +Something worth a warning occurred, +but no actual errors occurred. .PP -Notices (not warnings or errors) printed on standard error don't affect -the exit status. +Notices (not warnings or errors) printed on standard error +don't affect the exit status. +. .SH ENVIRONMENT .B xz -parses space-separated lists of options from the environment variables +parses space-separated lists of options +from the environment variables .B XZ_DEFAULTS and .BR XZ_OPT , -in this order, before parsing the options from the command line. Note that -only options are parsed from the environment variables; all non-options -are silently ignored. Parsing is done with +in this order, before parsing the options from the command line. +Note that only options are parsed from the environment variables; +all non-options are silently ignored. +Parsing is done with .BR getopt_long (3) which is used also for the command line arguments. .TP @@ -1514,7 +1919,8 @@ which is used also for the command line arguments. User-specific or system-wide default options. Typically this is set in a shell initialization script to enable .BR xz 's -memory usage limiter by default. Excluding shell initialization scripts +memory usage limiter by default. +Excluding shell initialization scripts and similar special cases, scripts must never set or unset .BR XZ_DEFAULTS . .TP @@ -1523,15 +1929,22 @@ This is for passing options to .B xz when it is not possible to set the options directly on the .B xz -command line. This is the case e.g. when +command line. +This is the case e.g. when .B xz is run by a script or tool, e.g. GNU .BR tar (1): .RS -.IP -\fBXZ_OPT=\-2v tar caf foo.tar.xz foo +.RS +.PP +.nf +.ft CW +XZ_OPT=\-2v tar caf foo.tar.xz foo +.ft R +.fi +.RE .RE -.IP +.IP "" Scripts may use .B XZ_OPT e.g. to set script-specific default compression options. @@ -1541,10 +1954,17 @@ if that is reasonable, e.g. in .BR sh (1) scripts one may use something like this: .RS -.IP -\fBXZ_OPT=${XZ_OPT\-"\-7e"}; export XZ_OPT +.RS +.PP +.nf +.ft CW +XZ_OPT=${XZ_OPT\-"\-7e"} +export XZ_OPT +.ft R +.fi .RE -.IP +.RE +. .SH "LZMA UTILS COMPATIBILITY" The command line syntax of .B xz @@ -1553,26 +1973,32 @@ is practically a superset of .BR unlzma , and .BR lzcat -as found from LZMA Utils 4.32.x. In most cases, it is possible to replace -LZMA Utils with XZ Utils without breaking existing scripts. There are some -incompatibilities though, which may sometimes cause problems. +as found from LZMA Utils 4.32.x. +In most cases, it is possible to replace +LZMA Utils with XZ Utils without breaking existing scripts. +There are some incompatibilities though, +which may sometimes cause problems. +. .SS "Compression preset levels" The numbering of the compression level presets is not identical in .B xz and LZMA Utils. -The most important difference is how dictionary sizes are mapped to different -presets. Dictionary size is roughly equal to the decompressor memory usage. +The most important difference is how dictionary sizes +are mapped to different presets. +Dictionary size is roughly equal to the decompressor memory usage. .RS +.PP .TS tab(;); c c c c n n. Level;xz;LZMA Utils -\-1;64 KiB;64 KiB -\-2;512 KiB;1 MiB -\-3;1 MiB;512 KiB -\-4;2 MiB;1 MiB -\-5;4 MiB;2 MiB +\-0;256 KiB;N/A +\-1;1 MiB;64 KiB +\-2;2 MiB;1 MiB +\-3;4 MiB;512 KiB +\-4;4 MiB;1 MiB +\-5;8 MiB;2 MiB \-6;8 MiB;4 MiB \-7;16 MiB;8 MiB \-8;32 MiB;16 MiB @@ -1580,20 +2006,24 @@ Level;xz;LZMA Utils .TE .RE .PP -The dictionary size differences affect the compressor memory usage too, -but there are some other differences between LZMA Utils and XZ Utils, which +The dictionary size differences affect +the compressor memory usage too, +but there are some other differences between +LZMA Utils and XZ Utils, which make the difference even bigger: .RS +.PP .TS tab(;); c c c c n n. Level;xz;LZMA Utils 4.32.x -\-1;2 MiB;2 MiB -\-2;5 MiB;12 MiB -\-3;13 MiB;12 MiB -\-4;25 MiB;16 MiB -\-5;48 MiB;26 MiB +\-0;3 MiB;N/A +\-1;9 MiB;2 MiB +\-2;17 MiB;12 MiB +\-3;32 MiB;12 MiB +\-4;48 MiB;16 MiB +\-5;94 MiB;26 MiB \-6;94 MiB;45 MiB \-7;186 MiB;83 MiB \-8;370 MiB;159 MiB @@ -1605,15 +2035,18 @@ The default preset level in LZMA Utils is .B \-7 while in XZ Utils it is .BR \-6 , -so both use 8 MiB dictionary by default. +so both use an 8 MiB dictionary by default. +. .SS "Streamed vs. non-streamed .lzma files" -Uncompressed size of the file can be stored in the +The uncompressed size of the file can be stored in the .B .lzma -header. LZMA Utils does that when compressing regular files. -The alternative is to mark that uncompressed size is unknown and -use end of payload marker to indicate where the decompressor should stop. -LZMA Utils uses this method when uncompressed size isn't known, which is -the case for example in pipes. +header. +LZMA Utils does that when compressing regular files. +The alternative is to mark that uncompressed size is unknown +and use end of payload marker to indicate +where the decompressor should stop. +LZMA Utils uses this method when uncompressed size isn't known, +which is the case for example in pipes. .PP .B xz supports decompressing @@ -1622,16 +2055,20 @@ files with or without end of payload marker, but all .B .lzma files created by .B xz -will use end of payload marker and have uncompressed size marked as unknown -in the +will use end of payload marker and have uncompressed size +marked as unknown in the .B .lzma -header. This may be a problem in some (uncommon) situations. For example, a +header. +This may be a problem in some uncommon situations. +For example, a .B .lzma -decompressor in an embedded device might work only with files that have known -uncompressed size. If you hit this problem, you need to use LZMA Utils or -LZMA SDK to create +decompressor in an embedded device might work +only with files that have known uncompressed size. +If you hit this problem, you need to use LZMA Utils +or LZMA SDK to create .B .lzma files with known uncompressed size. +. .SS "Unsupported .lzma files" The .B .lzma @@ -1639,7 +2076,8 @@ format allows .I lc values up to 8, and .I lp -values up to 4. LZMA Utils can decompress files with any +values up to 4. +LZMA Utils can decompress files with any .I lc and .IR lp , @@ -1655,24 +2093,25 @@ is possible with .B xz and with LZMA SDK. .PP -The implementation of the LZMA1 filter in liblzma requires -that the sum of +The implementation of the LZMA1 filter in liblzma +requires that the sum of .I lc and .I lp -must not exceed 4. Thus, +must not exceed 4. +Thus, .B .lzma -files which exceed this limitation, cannot be decompressed with +files, which exceed this limitation, cannot be decompressed with .BR xz . .PP LZMA Utils creates only .B .lzma -files which have dictionary size of +files which have a dictionary size of .RI "2^" n -(a power of 2), but accepts files with any dictionary size. +(a power of 2) but accepts files with any dictionary size. liblzma accepts only .B .lzma -files which have dictionary size of +files which have a dictionary size of .RI "2^" n or .RI "2^" n " + 2^(" n "\-1)." @@ -1680,13 +2119,18 @@ This is to decrease false positives when detecting .B .lzma files. .PP -These limitations shouldn't be a problem in practice, since practically all +These limitations shouldn't be a problem in practice, +since practically all .B .lzma files have been compressed with settings that liblzma will accept. +. .SS "Trailing garbage" -When decompressing, LZMA Utils silently ignore everything after the first +When decompressing, +LZMA Utils silently ignore everything after the first .B .lzma -stream. In most situations, this is a bug. This also means that LZMA Utils +stream. +In most situations, this is a bug. +This also means that LZMA Utils don't support decompressing concatenated .B .lzma files. @@ -1695,34 +2139,46 @@ If there is data left after the first .B .lzma stream, .B xz -considers the file to be corrupt. This may break obscure scripts which have +considers the file to be corrupt. +This may break obscure scripts which have assumed that trailing garbage is ignored. +. .SH NOTES -.SS Compressed output may vary -The exact compressed output produced from the same uncompressed input file -may vary between XZ Utils versions even if compression options are identical. -This is because the encoder can be improved (faster or better compression) -without affecting the file format. The output can vary even between different -builds of the same XZ Utils version, if different build options are used. +. +.SS "Compressed output may vary" +The exact compressed output produced from +the same uncompressed input file +may vary between XZ Utils versions even if +compression options are identical. +This is because the encoder can be improved +(faster or better compression) +without affecting the file format. +The output can vary even between different +builds of the same XZ Utils version, +if different build options are used. .PP The above means that implementing .B \-\-rsyncable to create rsyncable .B .xz -files is not going to happen without freezing a part of the encoder +files is not going to happen without +freezing a part of the encoder implementation, which can then be used with .BR \-\-rsyncable . -.SS Embedded .xz decompressors +. +.SS "Embedded .xz decompressors" Embedded .B .xz -decompressor implementations like XZ Embedded don't necessarily support files -created with +decompressor implementations like XZ Embedded don't necessarily +support files created with integrity .I check types other than .B none and .BR crc32 . -Since the default is \fB\-\-check=\fIcrc64\fR, you must use +Since the default is +.BR \-\-check=crc64 , +you must use .B \-\-check=none or .B \-\-check=crc32 @@ -1732,41 +2188,111 @@ Outside embedded systems, all .B .xz format decompressors support all the .I check -types, or at least are able to decompress the file without verifying the +types, or at least are able to decompress +the file without verifying the integrity check if the particular .I check is not supported. .PP -XZ Embedded supports BCJ filters, but only with the default start offset. +XZ Embedded supports BCJ filters, +but only with the default start offset. +. .SH EXAMPLES +. .SS Basics +Compress the file +.I foo +into +.I foo.xz +using the default compression level +.RB ( \-6 ), +and remove +.I foo +if compression is successful: +.RS +.PP +.nf +.ft CW +xz foo +.ft R +.fi +.RE +.PP +Decompress +.I bar.xz +into +.I bar +and don't remove +.I bar.xz +even if decompression is successful: +.RS +.PP +.nf +.ft CW +xz \-dk bar.xz +.ft R +.fi +.RE +.PP +Create +.I baz.tar.xz +with the preset +.B \-4e +.RB ( "\-4 \-\-extreme" ), +which is slower than e.g. the default +.BR \-6 , +but needs less memory for compression and decompression (48\ MiB +and 5\ MiB, respectively): +.RS +.PP +.nf +.ft CW +tar cf \- baz | xz \-4e > baz.tar.xz +.ft R +.fi +.RE +.PP A mix of compressed and uncompressed files can be decompressed to standard output with a single command: -.IP -.B "xz \-dcf a.txt b.txt.xz c.txt d.txt.xz > abcd.txt" -.SS Parallel compression of many files +.RS +.PP +.nf +.ft CW +xz \-dcf a.txt b.txt.xz c.txt d.txt.lzma > abcd.txt +.ft R +.fi +.RE +. +.SS "Parallel compression of many files" On GNU and *BSD, .BR find (1) and .BR xargs (1) can be used to parallelize compression of many files: +.RS .PP -.IP -.B "find . \-type f \e! \-name '*.xz' \-print0 |" -.B "xargs \-0r \-P4 \-n16 xz \-T1" +.nf +.ft CW +find . \-type f \e! \-name '*.xz' \-print0 \e + | xargs \-0r \-P4 \-n16 xz \-T1 +.ft R +.fi +.RE .PP The .B \-P -option sets the number of parallel +option to +.BR xargs (1) +sets the number of parallel .B xz -processes. The best value for the +processes. +The best value for the .B \-n option depends on how many files there are to be compressed. -If there are only a couple of files, the value should probably be -.BR 1 ; +If there are only a couple of files, +the value should probably be 1; with tens of thousands of files, -.B 100 -or even more may be appropriate to reduce the number of +100 or even more may be appropriate to reduce the number of .B xz processes that .BR xargs (1) @@ -1779,15 +2305,257 @@ for is there to force it to single-threaded mode, because .BR xargs (1) is used to control the amount of parallelization. -.SS Robot mode examples -Calculating how many bytes have been saved in total after compressing -multiple files: -.IP -.B "xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}'" +. +.SS "Robot mode" +Calculate how many bytes have been saved in total +after compressing multiple files: +.RS +.PP +.nf +.ft CW +xz \-\-robot \-\-list *.xz | awk '/^totals/{print $5\-$4}' +.ft R +.fi +.RE +.PP +A script may want to know that it is using new enough +.BR xz . +The following +.BR sh (1) +script checks that the version number of the +.B xz +tool is at least 5.0.0. +This method is compatible with old beta versions, +which didn't support the +.B \-\-robot +option: +.RS +.PP +.nf +.ft CW +if ! eval "$(xz \-\-robot \-\-version 2> /dev/null)" || + [ "$XZ_VERSION" \-lt 50000002 ]; then + echo "Your xz is too old." +fi +unset XZ_VERSION LIBLZMA_VERSION +.ft R +.fi +.RE +.PP +Set a memory usage limit for decompression using +.BR XZ_OPT , +but if a limit has already been set, don't increase it: +.RS +.PP +.nf +.ft CW +NEWLIM=$((123 << 20)) # 123 MiB +OLDLIM=$(xz \-\-robot \-\-info\-memory | cut \-f3) +if [ $OLDLIM \-eq 0 \-o $OLDLIM \-gt $NEWLIM ]; then + XZ_OPT="$XZ_OPT \-\-memlimit\-decompress=$NEWLIM" + export XZ_OPT +fi +.ft R +.fi +.RE +. +.SS "Custom compressor filter chains" +The simplest use for custom filter chains is +customizing a LZMA2 preset. +This can be useful, +because the presets cover only a subset of the +potentially useful combinations of compression settings. +.PP +The CompCPU columns of the tables +from the descriptions of the options +.BR "\-0" " ... " "\-9" +and +.B \-\-extreme +are useful when customizing LZMA2 presets. +Here are the relevant parts collected from those two tables: +.RS +.PP +.TS +tab(;); +c c +n n. +Preset;CompCPU +\-0;0 +\-1;1 +\-2;2 +\-3;3 +\-4;4 +\-5;5 +\-6;6 +\-5e;7 +\-6e;8 +.TE +.RE +.PP +If you know that a file requires +somewhat big dictionary (e.g. 32 MiB) to compress well, +but you want to compress it quicker than +.B "xz \-8" +would do, a preset with a low CompCPU value (e.g. 1) +can be modified to use a bigger dictionary: +.RS +.PP +.nf +.ft CW +xz \-\-lzma2=preset=1,dict=32MiB foo.tar +.ft R +.fi +.RE +.PP +With certain files, the above command may be faster than +.B "xz \-6" +while compressing significantly better. +However, it must be emphasized that only some files benefit from +a big dictionary while keeping the CompCPU value low. +The most obvious situation, +where a big dictionary can help a lot, +is an archive containing very similar files +of at least a few megabytes each. +The dictionary size has to be significantly bigger +than any individual file to allow LZMA2 to take +full advantage of the similarities between consecutive files. +.PP +If very high compressor and decompressor memory usage is fine, +and the file being compressed is +at least several hundred megabytes, it may be useful +to use an even bigger dictionary than the 64 MiB that +.B "xz \-9" +would use: +.RS +.PP +.nf +.ft CW +xz \-vv \-\-lzma2=dict=192MiB big_foo.tar +.ft R +.fi +.RE +.PP +Using +.B \-vv +.RB ( "\-\-verbose \-\-verbose" ) +like in the above example can be useful +to see the memory requirements +of the compressor and decompressor. +Remember that using a dictionary bigger than +the size of the uncompressed file is waste of memory, +so the above command isn't useful for small files. +.PP +Sometimes the compression time doesn't matter, +but the decompressor memory usage has to be kept low +e.g. to make it possible to decompress the file on +an embedded system. +The following command uses +.B \-6e +.RB ( "\-6 \-\-extreme" ) +as a base and sets the dictionary to only 64\ KiB. +The resulting file can be decompressed with XZ Embedded +(that's why there is +.BR \-\-check=crc32 ) +using about 100\ KiB of memory. +.RS +.PP +.nf +.ft CW +xz \-\-check=crc32 \-\-lzma2=preset=6e,dict=64KiB foo +.ft R +.fi +.RE +.PP +If you want to squeeze out as many bytes as possible, +adjusting the number of literal context bits +.RI ( lc ) +and number of position bits +.RI ( pb ) +can sometimes help. +Adjusting the number of literal position bits +.RI ( lp ) +might help too, but usually +.I lc +and +.I pb +are more important. +E.g. a source code archive contains mostly US-ASCII text, +so something like the following might give +slightly (like 0.1\ %) smaller file than +.B "xz \-6e" +(try also without +.BR lc=4 ): +.RS +.PP +.nf +.ft CW +xz \-\-lzma2=preset=6e,pb=0,lc=4 source_code.tar +.ft R +.fi +.RE +.PP +Using another filter together with LZMA2 can improve +compression with certain file types. +E.g. to compress a x86-32 or x86-64 shared library +using the x86 BCJ filter: +.RS +.PP +.nf +.ft CW +xz \-\-x86 \-\-lzma2 libfoo.so +.ft R +.fi +.RE +.PP +Note that the order of the filter options is significant. +If +.B \-\-x86 +is specified after +.BR \-\-lzma2 , +.B xz +will give an error, +because there cannot be any filter after LZMA2, +and also because the x86 BCJ filter cannot be used +as the last filter in the chain. +.PP +The Delta filter together with LZMA2 +can give good results with bitmap images. +It should usually beat PNG, +which has a few more advanced filters than simple +delta but uses Deflate for the actual compression. +.PP +The image has to be saved in uncompressed format, +e.g. as uncompressed TIFF. +The distance parameter of the Delta filter is set +to match the number of bytes per pixel in the image. +E.g. 24-bit RGB bitmap needs +.BR dist=3 , +and it is also good to pass +.B pb=0 +to LZMA2 to accomodate the three-byte alignment: +.RS +.PP +.nf +.ft CW +xz \-\-delta=dist=3 \-\-lzma2=pb=0 foo.tiff +.ft R +.fi +.RE +.PP +If multiple images have been put into a single archive (e.g.\& +.BR .tar ), +the Delta filter will work on that too as long as all images +have the same number of bytes per pixel. +. .SH "SEE ALSO" .BR xzdec (1), +.BR xzdiff (1), +.BR xzgrep (1), +.BR xzless (1), +.BR xzmore (1), .BR gzip (1), -.BR bzip2 (1) +.BR bzip2 (1), +.BR 7z (1) .PP XZ Utils: .br diff --git a/src/xzdec/xzdec.1 b/src/xzdec/xzdec.1 index ed14a03..7cc9be5 100644 --- a/src/xzdec/xzdec.1 +++ b/src/xzdec/xzdec.1 @@ -4,7 +4,7 @@ .\" This file has been put into the public domain. .\" You can do whatever you want with this file. .\" -.TH XZDEC 1 "2010-08-07" "Tukaani" "XZ Utils" +.TH XZDEC 1 "2010-09-27" "Tukaani" "XZ Utils" .SH NAME xzdec, lzmadec \- Small .xz and .lzma decompressors .SH SYNOPSIS @@ -25,7 +25,8 @@ files. .B xzdec is intended to work as a drop-in replacement for .BR xz (1) -in the most common situations where a script has been written to use +in the most common situations where a script +has been written to use .B "xz \-\-decompress \-\-stdout" (and possibly a few other commonly used options) to decompress .B .xz @@ -43,7 +44,8 @@ files. .PP To reduce the size of the executable, .B xzdec -doesn't support multithreading or localization, and doesn't read options from +doesn't support multithreading or localization, +and doesn't read options from .B XZ_DEFAULTS and .B XZ_OPT @@ -90,8 +92,7 @@ Ignored for .BR xz (1) compatibility. .B xzdec -never uses the exit status -.BR "2" . +never uses the exit status 2. .TP .BR \-h ", " \-\-help Display a help message and exit successfully. @@ -111,18 +112,32 @@ An error occurred. .B xzdec doesn't have any warning messages like .BR xz (1) -has, thus the exit status -.B 2 -is not used by +has, thus the exit status 2 is not used by .BR xzdec . .SH NOTES +Use +.BR xz (1) +instead of +.B xzdec +or +.B lzmadec +for normal everyday use. +.B xzdec +or +.B lzmadec +are meant only for situations where it is important to have +a smaller decompressor than the full-featured +.BR xz (1). +.PP .B xzdec and .B lzmadec -are not really that small. The size can be reduced further by dropping -features from liblzma at compile time, but that shouldn't usually be done -for executables distributed in typical non-embedded operating system -distributions. If you need a truly small +are not really that small. +The size can be reduced further by dropping +features from liblzma at compile time, +but that shouldn't usually be done for executables distributed +in typical non-embedded operating system distributions. +If you need a truly small .B .xz decompressor, consider using XZ Embedded. .SH "SEE ALSO" -- 2.39.2