Linux command from python to execute inside subfolders

Question

This is my folder structure: As their are many csv files inside a particular date. i want to combine all csv file inside each date folder with header of 1st file only into one single file and name is as orderno_year_month_date.csv. means every datefolder will have have only one csv named after their parent folders.So, i want clean command for

Accepted Answer

I will be using this mock file structure (plotted using the tree command, and saved under ~/test/ in my computer):test└── 408    └── 2010        └── 01            ├── 21            │   ├── 1.csv            │   └── 2.csv            ├── 22            │   ├── 1.csv            │   └── 2.csv            └── 23                ├── 1.csv                └── 2.csvYou can rename the files using Python, with the help of pathlib and concatenate  them using pandas:import pandas as pdfrom pathlib import Pathdef getfolders(files):    return sorted(list(set([file.parent for file in files])))def getpathproperty(folder, prop):    properties = {"orderno": 3, "year": 2, "month": 1, "day": 0}    for i in range(properties[prop]):        folder = folder.parent    return folder.stempath = Path("~/test").expanduser()allfiles = list(path.rglob("*.csv")) # Each file in allfiles is a Path objectfolders = getfolders(allfiles)for folder in folders:    files = sorted(list(folder.glob("*.csv")))    df = pd.concat([pd.read_csv(file) for file in files])    # Get the values from the path to rename the files    orderno = getpathproperty(folder, "orderno")    year = getpathproperty(folder, "year")    month = getpathproperty(folder, "month")    day = getpathproperty(folder, "day")    # Save the new CSV file    df.to_csv(folder/f"{orderno}_{year}_{month}_{day}.csv", index=False)    # Delete old files, commented for safety    # for file in files:        # file.unlink(missing_ok=True)This yields:test└── 408    └── 2010        └── 01            ├── 21            │   └── 408_2010_01_21.csv            ├── 22            │   └── 408_2010_01_22.csv            └── 23                └── 408_2010_01_23.csv

Advertisement

Answer