I am using the following shell script to loop over 90 zip files & unarchive them on a Linux box hosted with Hostinger (Shared web hosting)
#!/bin/bash SOURCE_DIR="<path_to_archives>" cd ${SOURCE_DIR} for f in *.zip do # unzip -oqq "$f" -d "${f%.zip}" & python3 scripts/extract_archives.py "${f}" & done wait
The python script being called by the above shell script is below –
import shutil import sys source_path = "<path to source dir>" def extract_files(in_file): shutil.unpack_archive(source_path + in_file, source_path + in_file.split('.')[0]) print('Extracted : ', in_file) extract_files(sys.argv[1].strip())
Irrespective of whether I use the inbuilt unzip
command or a python, it’s taking about 2.5 hours to unzip all the files. unarchiving all the zip files results 90 folders with 170000 files overall. I would’ve thought anywhere between 15/20 min is reasonably acceptable timeframe.
I’ve tried a few different variations in that, I have tried just tarring the folders instead of zipping them up thinking just un-tarring may be faster than unzipping. I’ve used tar command from source server to transfer the files over ssh & untar in memory something like this –
time tar zcf - . | ssh -p <port> user@host "tar xzf - -C <dest dir>"
Nothing is helping. I am open to using any other programming language like Perl, Go or others too if necessary to speed things up.
Please can someone help me solve this performance problem.
Advertisement
Answer
Thank you everyone for your answers. As you indicated, this was to do with throttling on the servers in a hosted environment