I want to merge the columns that have the list objects.
The problem is, I need to remove duplicate parts.
How am I able to get the columns that have the merged list like below?
Source:
JavaScript
x
6
1
col_0 col_a col_b col_c
2
3
0 aa [1] NaN [2,3]
4
1 bb [a, b] [b, c] [c]
5
2 cc NaN NaN NaN
6
Expected:
JavaScript
1
6
1
col_0 col_a col_b col_c merged_a_to_c
2
3
0 aa [1] NaN [2,3] [1,2,3]
4
1 bb [a, b] [b, c] [c] [a, b, c]
5
2 cc NaN NaN NaN NaN
6
Advertisement
Answer
JavaScript
1
18
18
1
def merge(df):
2
merged_a_to_c = []
3
for row in range(len(df)):
4
merge_tmp = []
5
for columns in range(len(df.columns)):
6
if type(df.iloc[row, columns]) == list:
7
for element in df.iloc[row, columns]:
8
if element not in merge_tmp:
9
merge_tmp.append(element)
10
11
if merge_tmp != []:
12
merged_a_to_c.append(merge_tmp)
13
else:
14
merged_a_to_c.append(np.nan)
15
16
df['merged_a_to_c'] = merged_a_to_c
17
return(df)
18
JavaScript
1
5
1
col_0 col_a col_b col_c merged_a_to_c
2
0 aa [1] NaN [2, 3] [1, 2, 3]
3
1 bb [a, b] [b, c] [c] [a, b, c]
4
2 cc NaN NaN NaN NaN
5
You can use this code regardless of the size(column lengths, row lengths) of dataframes.
I edited some codes cuz I didn’t realize that I should concern the duplicate problems.