How do I decompress a MSZIP block?

Question

I have a compressed file that is a CAB that I wish to extract a file from in Linux. Since there aren't any native CAB extractors on Linux, I figured I'd try my hand at getting one done. While I've seen the MSZIP documentation[0] as well as [1] and [2], I'm having difficulties in decompressing it even given that each

Accepted Answer

No, the &#8220;basic concept&#8221; you thought you got from RFC 1951 is completely wrong. First off, the bits in each byte are read from least significant to most significant. Second, you do not reverse the bytes in the stream. The first eight bits are first read from the first byte, the second eight bits from the second byte, and so on. (Lengths in stored blocks are stored little-endian, but that is neither reversed nor not reversed. It is simply how a 16-bit length is serialized in the byte stream.)Once you read the bits correctly, the HLIT, etc. values are stored exactly as stated in the RFC. The first five bits are the number of literal/length codes minus 257. So you take the value of the five bits, which gives a number in 0..31, and add 257 to that. That gives a number in the range 257..288. The allowed range is actually 257..286 as noted on that same line, so the last two possible values of the five bits, 30 and 31, should not appear in a valid deflate stream.RFC 1951 is not confusing at all. It is a clear and complete description of the format. However you need to have sufficient background in compression, in particular Huffman codes, to understand it. The RFC was not intended to be a textbook on compression, nor a textbook on how integers are coded in bits.It is clear it would take you some time to figure this all out. Fortunately, you do not need to write your own inflator. You can instead use zlib. Read the documentation in zlib.h for all of the inflate functions.In CAB files, MSZIP CFDATA blocks use the history from previous CFDATA blocks, until a folder boundary is reached. Even though each block is a properly terminated deflate stream, the next block can refer to uncompressed data from the previous block. To process CFDATA blocks after the first, you will need to use the inflateResetKeep() function of zlib to restart the inflate process while retaining the dictionary from the previous inflate operation.For reference, here is a decoding of the initial bytes of the deflate stream you provided, using infgen:! infgen 2.5 output!last            ! 1dynamic         ! 10count 286 30 16     ! 1100 11101 11101code 16 4       ! 100code 17 7       ! 111code 0 4        ! 100 000code 8 3        ! 011code 7 4        ! 100code 9 3        ! 011code 6 4        ! 100code 10 3       ! 011code 5 4        ! 100code 11 3       ! 011code 4 5        ! 101code 12 3       ! 011code 3 7        ! 111code 2 6        ! 110 000lens 6          ! 1100lens 8          ! 000lens 8          ! 000lens 9          ! 001repeat 6        ! 11 1110lens 9          ! 001lens 8          ! 000lens 10         ! 010lens 9          ! 001lens 9          ! 001lens 9          ! 001lens 8          ! 000repeat 6        ! 11 1110repeat 4        ! 01 1110lens 9          ! 001lens 9          ! 001lens 10         ! 010lens 10         ! 010lens 10         ! 010lens 8          ! 000lens 9          ! 001repeat 3        ! 00 1110lens 10         ! 010lens 9          ! 001lens 10         ! 010lens 10         ! 010lens 9          ! 001lens 10         ! 010lens 9          ! 001lens 10         ! 010lens 9          ! 001lens 8          ! 000lens 9          ! 001lens 8          ! 000lens 8          ! 000lens 7          ! 1101lens 8          ! 000lens 8          ! 000lens 9          ! 001lens 8          ! 000lens 10         ! 010lens 7          ! 1101lens 9          ! 001lens 8          ! 000lens 12         ! 100lens 9          ! 001lens 10         ! 010lens 10         ! 010lens 10         ! 010lens 8          ! 000lens 7          ! 1101repeat 4        ! 01 1110lens 8          ! 000lens 8          ! 000lens 8          ! 000lens 7          ! 1101lens 9          ! 001

Advertisement

Answer