Skip to content
Advertisement

If a column has ‘x’ value, find max value from other columns in row, and change x to the value of max column name

I have a dataframe similar to the following:

data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
        'topic': [-1, 0, 1, 2, 0, -1, 2],
        '-1': [1, 0, 0, 0.1, 0, 0.99, 0],
        '0': [0, 1, 0, 0, 1, 0, 0.1],
        '1': [0, 0, 0.99, 0, 0, 0, 0],
        '2': [0.4, 0, 0, 0.7, 0, 0, 1],
        }

df = pd.DataFrame(data)

If the column topic value is -1, then I want to look in the same row of columns 0 to 2, and change the value in topic, to the header of the max value.

enter image description here

As an example, in the first row in the table above, the column topic has a value of -1, so I want to change that value to whatever is the header name of the max value in the row. So that would be column 2, which has a value of 0.4.

So now the table looks like this, where the topic value of ref 0001 has changed from -1 to 2

enter image description here

The other point is, that if like ref 0006, there is no value > 0 in the other columns despite having a topic value of -1, then it should be left alone.

I hope this makes sense. Struggling hard to get this done.

Thank you!

Advertisement

Answer

Setting up example:

import pandas as pd
data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'],
        'topic': [-1, 0, 1, 2, 0, -1, 2],
        '-1': [1, 0, 0, 0.1, 0, 0.99, 0],
        '0': [0, 1, 0, 0, 1, 0, 0.1],
        '1': [0, 0, 0.99, 0, 0, 0, 0],
        '2': [0.4, 0, 0, 0.7, 0, 0, 1],
        }

df = pd.DataFrame(data)

Making a boolean mask to set the rows:

cols = [str(x) for x in range(0, 3)]
condition = (df.topic == -1) & ((df[cols] > 0).any(axis=1))

0     True
1    False
2    False
3    False
4    False
5    False
6    False
dtype: bool

using idxmax gets the names of the columns at the maximum values

df[cols].idxmax(axis=1)

0    2
1    0
2    1
3    2
4    0
5    0
6    2
dtype: object

Now you can use df.where and the boolean condition to set the topic values.

df["topic"] = df["topic"].where(~condition, df[cols].idxmax(axis=1)[condition])

Output:

    ref topic    -1    0     1    2
0  0001     2  1.00  0.0  0.00  0.4
1  0002     0  0.00  1.0  0.00  0.0
2  0003     1  0.00  0.0  0.99  0.0
3  0004     2  0.10  0.0  0.00  0.7
4  0005     0  0.00  1.0  0.00  0.0
5  0006    -1  0.99  0.0  0.00  0.0
6  0007     2  0.00  0.1  0.00  1.0
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement