I have a zip file (stored locally) with multiple folders in it. In each folder are a few CSV files. I need to only access 1 particular CSV from each folder. The CSV’s I am trying to access from each folder all share the same name, but I cannot figure out how to access a particular file from each folder, then concatenate them into a pandas df.
I have tried the below (initially trying to read all CSV’s):
path = r"C:Users...Downloadsfolder.zip" all_files = glob.glob(os.path.join(path , "/*.csv")) li = [] for filename in all_files: df = pd.read_csv(filename, index_col=None) li.append(df) frame = pd.concat(li, axis=0, ignore_index=True)
But I get: ValueError: No objects to concatenate. The CSV’s are definitely present and not empty.
I am currently trying to do this in a sagemaker notebook, not sure if that is also causing me problems. Any help would be great.
Advertisement
Answer
After some digging and advice from Umar.H and mad, I figured out a solution to my original question and to the code example I was originally working with.
The code I was originally working with wasn’t working with accessing the zip file directly, so I unzipped the file and tried it on just a regular folder. Amending the empty list of df’s li
to not return an empty list was solved by changing "/*file.csv"
in all_files to "*/*file.csv
.
To solve the main issue I had, which was to avoid unzipping the zip file and access all required CSV’s I managed to get the following to work
PATH = "C:/Users/.../Downloads/folder.zip" li = [] with zipfile.ZipFile(PATH, "r") as f: for name in f.namelist(): if name.endswith("file.csv"): data = f.open(name) df = pd.read_csv(data, header=None, low_memory = False) li.append(df) frame = pd.concat(li, axis=0, ignore_index=True)
Hope this can be helpful for anyone else with large zip files.