Skip to content
Advertisement

Pandas get column values based on duplicate rows

I have a sample DF:

JavaScript

OP:

JavaScript

I am trying to get the values of columns – "A" and "B" wherever there are duplicate values in column col. For example the column col has value Apple in index – 0,1,3,5 and I am trying to get the respective values in column – A and B, ie

JavaScript

I have a iterative approach which takes a long time on big Dfs.

Current Approach:

-> Find Unique values in column col

JavaScript

-> Iterate through this list and a inner loop through every row of the DF to get the required OP:

JavaScript

final OP:

JavaScript

Is there any suggestions for a more pandas approach which could be more efficient?

Advertisement

Answer

Create column filled by list C convert values to numpy array and to list and then aggregate list by GroupBy.agg with Series.to_dict:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement