I want to merge the columns that have the list objects.
The problem is, I need to remove duplicate parts.
How am I able to get the columns that have the merged list like below?
Source:
col_0 col_a col_b col_c 0 aa [1] NaN [2,3] 1 bb [a, b] [b, c] [c] 2 cc NaN NaN NaN
Expected:
col_0 col_a col_b col_c merged_a_to_c 0 aa [1] NaN [2,3] [1,2,3] 1 bb [a, b] [b, c] [c] [a, b, c] 2 cc NaN NaN NaN NaN
Advertisement
Answer
def merge(df):
merged_a_to_c = []
for row in range(len(df)):
merge_tmp = []
for columns in range(len(df.columns)):
if type(df.iloc[row, columns]) == list:
for element in df.iloc[row, columns]:
if element not in merge_tmp:
merge_tmp.append(element)
if merge_tmp != []:
merged_a_to_c.append(merge_tmp)
else:
merged_a_to_c.append(np.nan)
df['merged_a_to_c'] = merged_a_to_c
return(df)
col_0 col_a col_b col_c merged_a_to_c 0 aa [1] NaN [2, 3] [1, 2, 3] 1 bb [a, b] [b, c] [c] [a, b, c] 2 cc NaN NaN NaN NaN
You can use this code regardless of the size(column lengths, row lengths) of dataframes.
I edited some codes cuz I didn’t realize that I should concern the duplicate problems.