I have a root folder
/home/project/data
With multiple folders in it with and ultimate paths to csv files:
/home/project/data/2020-12-05/John_Smith/data.csv /home/project/data/2020-12-05/Robert_White/data.csv /home/project/data/2020-12-06/John_Smith/data.csv /home/project/data/2020-12-06/Sam_Walberg/data.csv /home/project/data/2020-12-06/Garry_Oswald/data.csv ...
I was managed to create a Dataframe containing all csv files concatenated using the following code:
full_path = [] for subdir, dirs, files in os.walk(rootdir): for file in files: full_path.append(os.path.join(subdir, file)) dfs = [pd.read_csv(csv_path) for csv_path in full_path] df = pd.concat(dfs)
Result:
df= pr_id quantity 0 27 4 1 89 1 2 33 2 3 8 3 4 16 1 ...
But I am now struggling to add the respective date+name to the Dataframe, so it would look like this:
df= pr_id quantity name date 0 27 4 John_Smith 2020-12-05 1 89 1 Robert_White 2020-12-05 2 33 2 John_Smith 2020-12-06 3 8 3 Sam_Walberg 2020-12-06 4 16 1 Garry_Oswald 2020-12-06 ...
How can I do it?
Advertisement
Answer
With pathlib
, you can go 1 & 2 directories up and get the name
and date
. Since this involves two things, an explicit for loop might be more readable than the list comprehension:
from pathlib import Path # ...above are the same dfs = [] for csv_path in full_path: # generate a `Path` object and get parents p = Path(csv_path) parents = p.parents # get the desired values from "parent" dirs name = parents[0].name date = parents[1].name # read in the CSV as is frame = pd.read_csv(csv_path) # assign the `name` and `date` columns frame["name"] = name frame["date"] = date # store in the list dfs.append(frame) # lastly concating as you did df = pd.concat(dfs)
Or equivalently, the list comprehension counterpart is:
dfs = [pd.read_csv(csv_path).assign(name=csv_path.parents[0].name, date=csv_path.parents[1].name) for csv_path in map(Path, full_path)] df = pd.concat(dfs)
where we use assign
to put new columns to each dataframe.
It depends on you to choose between explicit for loop or list comprehension.