Skip to content
Advertisement

unzip operation taking several hours

I am using the following shell script to loop over 90 zip files & unarchive them on a Linux box hosted with Hostinger (Shared web hosting)

#!/bin/bash

SOURCE_DIR="<path_to_archives>"

cd ${SOURCE_DIR}

for f in *.zip
do
#   unzip -oqq "$f" -d "${f%.zip}" &
    python3 scripts/extract_archives.py "${f}" &
done
wait

The python script being called by the above shell script is below –

import shutil
import sys

source_path = "<path to source dir>"

def extract_files(in_file):
    shutil.unpack_archive(source_path + in_file, source_path + in_file.split('.')[0])
    print('Extracted : ', in_file)


extract_files(sys.argv[1].strip())

Irrespective of whether I use the inbuilt unzip command or a python, it’s taking about 2.5 hours to unzip all the files. unarchiving all the zip files results 90 folders with 170000 files overall. I would’ve thought anywhere between 15/20 min is reasonably acceptable timeframe.

I’ve tried a few different variations in that, I have tried just tarring the folders instead of zipping them up thinking just un-tarring may be faster than unzipping. I’ve used tar command from source server to transfer the files over ssh & untar in memory something like this –

time tar zcf - . | ssh -p <port> user@host "tar xzf - -C <dest dir>"

Nothing is helping. I am open to using any other programming language like Perl, Go or others too if necessary to speed things up.

Please can someone help me solve this performance problem.

Advertisement

Answer

Thank you everyone for your answers. As you indicated, this was to do with throttling on the servers in a hosted environment

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement