Skip to content
Advertisement

Why does recompressing a file using gzip produces a different output?

I need to decompress, edit, and then recompress a Minecraft .dat file. However after recompression the file changes significantly (without any editing on my side) which makes it unreadable for the game.

Here’s the snippet of code I use to decompress,

import gzip
import shutil
with gzip.open('file_1.dat', 'rb') as f_in:
    with open('file_1.txt', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

and here’s the code I use to compress the file:

with open('file_1.txt', 'rb') as f_in2:
    with gzip.open('file_1_recmp.dat', 'wb') as f_out2:
        shutil.copyfileobj(f_in2, f_out2)

Here are the before and after files.

So, what am I doing wrong?

Advertisement

Answer

It can never be expected or relied upon that decompressing and recompressing will produce the same result. Different compression code, different versions of that same code, or different compression settings can all give a different result. The only guarantee offered by a lossless compressor is the opposite order, i.e. if you compress and then decompress you get exactly what you started with.

In your case, the question is what is making it “unreadable for the game”.

Update given binary files:

The before and after gzip files are both valid, and have the same uncompressed data.

The main difference between the before and after is that the header of the after has a file name and some other information in the header. My first theory would be that the decompressor in the game is not compliant with standard gzip headers, and is doing something simplistic and wrong like just skipping the first ten bytes, expecting what follows to be the deflate compressed data.

You can use gzip.GzipFile instead of gzip.open to control the contents of the header, leaving the file name blank. You can also set the modification time to zero, as it is in the original header. Simple example:

import sys
import gzip
f = open('out.gz', 'wb')
gz = gzip.GzipFile('', 'wb', 9, f, 0.)
gz.write('this is a test')
gz.close()
f.close()

(Or for Python 3, gz.write(b'this is a test').)

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement