I’m trying to concatenate several csv files and create a ‘DATE_TIME’ column based on the specific column ‘NAME’. I apply an IF condition for extract the substring of DATE_TIME from NAME based on the appearance of a string ‘RE’.
Here’s my code:
import pandas as pd import numpy as np import os path_dataset = r'C:Userstest' def get_file(path_dataset): files = os.listdir(path_dataset) #check file list files.sort() #sort file file_list = [] for file in files: path = path_dataset + "\" + file if (file.startswith("test")) and (file.endswith(".csv")): file_list.append(path) return (file_list) read_columns = ['NAME'] read_files = get_file(path_dataset) all_df = [] for file in read_files: df = pd.read_csv(file, usecols = read_columns) if (str(df['NAME'].astype(str).str[23:25]) == 'RE-'): df['DATE_TIME'] = df['NAME'].astype(str).str[26:40] else: df['DATE_TIME'] = df['NAME'].astype(str).str[22:37] all_df.append(df) Concat_table = pd.concat(all_df, axis=0) Concat_table = Concat_table.sort_values(['DATE_TIME']) Concat_table.head() Concat_table.to_csv(os.path.join(path_dataset, 'Concate_all.csv'), index=False)
My question is at the IF-ELSE statement. They return the same position of the substring, so that I get result below. This is not what I want, with the name ‘RE’ I still want to extract the date and time. What did I make a mistake for the code? Thank you.
Added my dataframe:
dic1 = {"NAME": ['1234567890-ABCDEFGHIJ-RE-20210802-194706', '1234567890-ABCDEFGHIJ-20210801-200321']} dic2 = {"NAME": ['1234567890-ABCDEFGHIJ-RE-20210731-050457', '1234567890-ABCDEFGHIJ-20210801-122356']} df1 = pd.DataFrame(dic1) df2 = pd.DataFrame(dic2)
Advertisement
Answer
No need of IF statement. Split on – and concat the 2 last elements
EDIT
f = lambda x: (x["NAME"].split("-"))[-2] + "-" + (x["NAME"].split("-"))[-1].replace('.xlsx', '') for file in read_files: df = pd.read_csv(file, usecols = read_columns) df['DATE_TIME'] = df.apply(f, axis=1)