Skip to content
Advertisement

What’s the easiest way to retrieve FTP files based on a list of filenames (in multiple FTP directories) – Python

In FTP, the structure looks like this:

main_folder / year / month / day / multiple csv files

For example:

main_folder / 2020 / 02 / 03 / '2020-02-03_01.csv', '2020-02-03_02.csv', '2020-02-03_03.csv', .....

main_folder / 2020 / 03 / 03 / '2020-03-03_01.csv', '2020-03-03_02.csv', '2020-03-03_03.csv', .....
main_folder / 2021 / 01 / 01 / '2021-01-01_01.csv', '2021-01-01_02.csv', '2021-01-01_03.csv', .....

So each year has 12 folders (one for each month), each month contains multiple folders (one for one day), and each day have multiple csv files (filename is consisted of the date_xx.csv).

I have a list of filenames that I want to download, for example:

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv',
                '2021-08-10_03.csv']

My current code behaves like this: extract the date year/month/day from the filename -> then construct the corresponding dir in FTP, for example, for file '2021-08-09_01.csv', it will look at all the files under dir main_folder/2021/08/09, but if I use the complete directory to tell FTP to only look at the specific file, it gave me error ftplib.error_perm: 550 No such directory.

This is the code:

file_dir = "main_folder/2021/08/09/2021-08-09_01.csv"

ftp_conn = open_ftp_connection(ftp_host, ftp_username, ftp_password, file_dir)
ftp = ftplib.FTP_TLS(host)
ftp.login(username, password)
ftp.cwd(file_dir)

I’m a bit confused here, how can I tell FTP to look for those files in the corresponding directory and read the data of them (end goal is to publish to s3 bucket)

Advertisement

Answer

This is how I would do it:

import ftplib, os

example_list = ['2021-08-09_01.csv', '2021-08-09_02.csv', '2021-08-10_12.csv', '2021-08-10_03.csv']

FTP_IP = "1.2.3.4"
FTP_LOGIN = "username"
FTP_PASSWD = "password"
CURRENT_DIR = os.getcwd()
MAIN_DIR = "/main_folder"

with ftplib.FTP(FTP_IP, FTP_LOGIN, FTP_PASSWD) as ftp:
    for entry in example_list:
        filesplit = entry.split("-")
        directory = "main_folder/"+filesplit[0]+"/"+filesplit[1]+"/"+filesplit[2].split("_")[0]
        ftp.cwd(directory)
        with open(os.path.join(CURRENT_DIR, entry), 'wb') as f:
            ftp.retrbinary(entry, f.write)
        ftp.cwd(MAIN_DIR)

The file will be downloaded to the directory, where you execute the python script from with the same filename as those on the server.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement