Skip to content
Advertisement

Opening zipfile of unsupported compression-type silently returns empty filestream, instead of throwing exception

Seem to be knocking my head off a newbie error and I am not a newbie. I have a 1.2G known-good zipfile ‘train.zip’ containing a 3.5G file ‘train.csv’. I open the zipfile and file itself without any exceptions (no LargeZipFile), but the resulting filestream appears to be empty. (UNIX ‘unzip -c …’ confirms it is good) The file objects returned by Python ZipFile.open() are not seek’able or tell’able, so I can’t check that.

Python distribution is 2.7.3 EPD-free 7.3-1 (32-bit) ; but should be ok for large zips. OS is MacOS 10.6.6

JavaScript

Advertisement

Answer

The cause is the combination of:

  • this file’s compression type is type 9: Deflate64/Enhanced Deflate (PKWare’s proprietary format, as opposed to the more common type 8)
  • and a zipfile bug: it will not throw an exception for unsupported compression-types. It used to just silently return a bad file object [Section 4.4.5 compression method]. Aargh. How bogus. UPDATE: I filed bug 14313 and it was fixed back in 2012 so it now raises NotImplementedError when the compression type is unknown.

A command-line Workaround is to unzip, then rezip, to get a plain type 8: Deflated.

zipfile will throw an exception in 2.7 , 3.2+ I guess zipfile will never be able to actually handle type 9, for legal reasons. The Python doc makes no mention whatsoever that zipfile cannot handle other compression types :(

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement