I have a dataframe similar to the following:
data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'], 'topic': [-1, 0, 1, 2, 0, -1, 2], '-1': [1, 0, 0, 0.1, 0, 0.99, 0], '0': [0, 1, 0, 0, 1, 0, 0.1], '1': [0, 0, 0.99, 0, 0, 0, 0], '2': [0.4, 0, 0, 0.7, 0, 0, 1], } df = pd.DataFrame(data)
If the column topic
value is -1
, then I want to look in the same row of columns 0 to 2
, and change the value in topic
, to the header of the max value.
As an example, in the first row in the table above, the column topic
has a value of -1
, so I want to change that value to whatever is the header name of the max value in the row. So that would be column 2, which has a value of 0.4.
So now the table looks like this, where the topic
value of ref 0001
has changed from -1
to 2
The other point is, that if like ref 0006
, there is no value > 0
in the other columns despite having a topic value of -1, then it should be left alone.
I hope this makes sense. Struggling hard to get this done.
Thank you!
Advertisement
Answer
Setting up example:
import pandas as pd data = {'ref': ['0001', '0002', '0003', '0004', '0005', '0006', '0007'], 'topic': [-1, 0, 1, 2, 0, -1, 2], '-1': [1, 0, 0, 0.1, 0, 0.99, 0], '0': [0, 1, 0, 0, 1, 0, 0.1], '1': [0, 0, 0.99, 0, 0, 0, 0], '2': [0.4, 0, 0, 0.7, 0, 0, 1], } df = pd.DataFrame(data)
Making a boolean mask to set the rows:
cols = [str(x) for x in range(0, 3)] condition = (df.topic == -1) & ((df[cols] > 0).any(axis=1)) 0 True 1 False 2 False 3 False 4 False 5 False 6 False dtype: bool
using idxmax gets the names of the columns at the maximum values
df[cols].idxmax(axis=1) 0 2 1 0 2 1 3 2 4 0 5 0 6 2 dtype: object
Now you can use df.where and the boolean condition to set the topic values.
df["topic"] = df["topic"].where(~condition, df[cols].idxmax(axis=1)[condition])
Output:
ref topic -1 0 1 2 0 0001 2 1.00 0.0 0.00 0.4 1 0002 0 0.00 1.0 0.00 0.0 2 0003 1 0.00 0.0 0.99 0.0 3 0004 2 0.10 0.0 0.00 0.7 4 0005 0 0.00 1.0 0.00 0.0 5 0006 -1 0.99 0.0 0.00 0.0 6 0007 2 0.00 0.1 0.00 1.0