Skip to content
Advertisement

How to access multiple CSV files that share the same name from multiple folders from a zip file

I have a zip file (stored locally) with multiple folders in it. In each folder are a few CSV files. I need to only access 1 particular CSV from each folder. The CSV’s I am trying to access from each folder all share the same name, but I cannot figure out how to access a particular file from each folder, then concatenate them into a pandas df.

I have tried the below (initially trying to read all CSV’s):

path = r"C:Users...Downloadsfolder.zip"
all_files = glob.glob(os.path.join(path , "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

But I get: ValueError: No objects to concatenate. The CSV’s are definitely present and not empty.

I am currently trying to do this in a sagemaker notebook, not sure if that is also causing me problems. Any help would be great.

Advertisement

Answer

After some digging and advice from Umar.H and mad, I figured out a solution to my original question and to the code example I was originally working with.

The code I was originally working with wasn’t working with accessing the zip file directly, so I unzipped the file and tried it on just a regular folder. Amending the empty list of df’s li to not return an empty list was solved by changing "/*file.csv" in all_files to "*/*file.csv.

To solve the main issue I had, which was to avoid unzipping the zip file and access all required CSV’s I managed to get the following to work

PATH = "C:/Users/.../Downloads/folder.zip"

li = []
with zipfile.ZipFile(PATH, "r") as f:
    for name in f.namelist():
        if name.endswith("file.csv"):
            data = f.open(name)
            df = pd.read_csv(data, header=None, low_memory = False)
            li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

Hope this can be helpful for anyone else with large zip files.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement