Skip to content
Advertisement

python – find duplicates in a column, replace values in another column for that duplicate

I have a dataframe that consists of of video game titles on various platforms. it contains, among other values the name, critic’s average score and user’s average score. Many of them are missing scores for the user, critic and/or ESRB rating.

What i’d like to do is replace the missing rating, critic and user scores with those for the same game on a different platform (assuming they exist) i’m not quite sure how to approach this.(note – i don’t want to drop the duplicate names, because they aren’t truly duplicate rows)

here is a sample chunk of the dataframe (i’ve removed some unrelated columns to make it manageable):

JavaScript

now, there don’t happen to be any duplicates that stick out in this head 30 lines, but for instance i have 007: Quantum of Solace on the PS3, Wii, DS, PC and x360. between all of the platforms i have a mean rating for both users and critics, as well as a rating.

as requested – here is a sample of some duplicated values:

JavaScript

i’ve separated my duplicates into their own dataframe (df1 is my original games dataframe, df2 is the duplicates dataframe):

JavaScript

so i can see my duplicates and their values, but of course i don’t wanna fill in 8500 missing values from duplicates by hand.

I can find the duplicated names, but i don’t know how to fill the NaN values with the “good” values from the other platform?

i’m at a loss for how to begin this and would appreciate any input into a direction.

now – to add another step to it – in my example above of the 007 game – the critic and user scores aren’t the same across platforms (the ps3 game got a 65, the wii game got a 54 and PC a 70) calculating the mean of the 3 should be the ideal solution, but i’ll settle for ANY of the platforms if that is too complex (as you might have guessed, i am very new to python)

I appreciate any time and effort you have to share on my behalf.

Regards,

Jared

Advertisement

Answer

I’m pretty sure pandas.DataFrame.groupby is what you need:

JavaScript

If you want to join these results with you dataframe, you can use:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement