I am trying to sort filepaths according to their respective file extensions.
I would like to have an output like this:
FileType | FilePath |
---|---|
.h | a/b/c/d/xyz.h |
.h | a/b/c/d/xyz1.h |
.class | a/b/c/d/xyz.class |
.class | a/b/c/d/xyz1.class |
.jar | a/b/c/d/xyz.jar |
.jar | a/b/c/d/xyz1.jar |
But the output I have now is like this: output in excel
Below is my code:
import pandas as pd import glob path = "The path goes here" yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]] print(type(yes)) #File type is list df = pd.DataFrame(yes) df = df.transpose() df.columns = [".h", ".class",".jar"] print (df) writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') df.to_excel(writer, sheet_name='filepath', index=False) writer.save()
Could anyone please help me with this. Thanks in advance!
Advertisement
Answer
Please try this code:
import os import pathlib import pandas as pd path = 'C:/' full_file_paths = [] file_suffix = [] for (root,dirs,files) in os.walk(path): for f in files: file_suffix.append(pathlib.PurePosixPath(f).suffix) full_file_paths.append(path+f) file_suffix = set(file_suffix) processed_files = dict() for fs in file_suffix: processed_files[fs]=[] for f in full_file_paths: if f.find(fs) > 0: processed_files[fs].append(f) print ('--------------------------------') print(fs) print(processed_files[fs])