pandas: manage duplicated sentences on different columns

Question

I have a dataframe as follows: I want to add the first column value to a sentence if that sentence is repeated somewhere else in the next three columns. so my desired output would be col1 col2 col3 col4 1_a 1_aJoe waited for the train. the weather is nice the house looks amazing 2_a The train was late. the we…

Accepted Answer

You can actually do some fancy numpy broadcasting here.search_cols = ['col2', 'col3', 'col4']index_col = 'col1'x = (df[search_cols].to_numpy() == df[search_cols].to_numpy()[:, :, None, None]) # Generate a list of grids of all the duplicate values anywherex = (x.sum(axis=3).sum(axis=2) - 1).astype(bool) # Combine the grids, filter for only those with more than one item (so 1 or more duplicates), and convert that to a boolean mask selecting all duplicated cellsmask = pd.DataFrame(mask, index=df.index, columns=search_cols) # Convert the boolean mask into a dataframe matching the row labels and col labels of the original dataframemask = mask.apply(lambda col: df.loc[col, index_col]).reindex(mask.index).fillna('') # Replace all True values in the mask to values from the key col, `col`, and replace all False values with an empty stringnew_df = mask + df[search_cols]Output:>>> new_df                                                col2                          col3                         col40                       1_aJoe waited for the train.           the weather is nice      the house looks amazing1                                The train was late.           the weather is cold    his profession is unknown2                    Mary and Samantha took the bus.              i like going out        it is a beautiful day3  I looked for Mary and Samantha at the bus stat...  4_aJoe waited for the train.  we just moved to this house

col1	col2	col3	col4
1_a	1_aJoe waited for the train.	the weather is nice	the house looks amazing
2_a	The train was late.	the weather is cold	his profession is unknown
3_a	Mary and Samantha took the bus.	i like going out	it is a beautiful day
4_a	I looked for Mary and Samantha at the bus station	4_aJoe waited for the train.	we just moved to this house

Advertisement

Answer