I want to merge the columns that have the list objects.
The problem is, I need to remove duplicate parts.
How am I able to get the columns that have the merged list like below?
Source:
col_0 col_a col_b col_c 0 aa [1] NaN [2,3] 1 bb [a, b] [b, c] [c] 2 cc NaN NaN NaN
Expected:
col_0 col_a col_b col_c merged_a_to_c 0 aa [1] NaN [2,3] [1,2,3] 1 bb [a, b] [b, c] [c] [a, b, c] 2 cc NaN NaN NaN NaN
Advertisement
Answer
def merge(df): merged_a_to_c = [] for row in range(len(df)): merge_tmp = [] for columns in range(len(df.columns)): if type(df.iloc[row, columns]) == list: for element in df.iloc[row, columns]: if element not in merge_tmp: merge_tmp.append(element) if merge_tmp != []: merged_a_to_c.append(merge_tmp) else: merged_a_to_c.append(np.nan) df['merged_a_to_c'] = merged_a_to_c return(df)
col_0 col_a col_b col_c merged_a_to_c 0 aa [1] NaN [2, 3] [1, 2, 3] 1 bb [a, b] [b, c] [c] [a, b, c] 2 cc NaN NaN NaN NaN
You can use this code regardless of the size(column lengths, row lengths) of dataframes.
I edited some codes cuz I didn’t realize that I should concern the duplicate problems.