I am new to Pandas and learning. I am reading excel to DataFrame and comparing columns and highlight the column that’s not same. For example if Column A is not same as Column B then highlight the Column B. However I have some null values in Column A and Column B. When I execute the code, I don’t want to highlight the null values in Column B. How can I do that? Below is my code:
file = Path(path to excel) df = pd.read_excel(file) def color(x): c1 = 'background-color: red' m1 = x['AMOUNT A'] != x['AMOUNT B'] m2 = x['AMOUNT C'] != x['AMOUNT D'] df = pd.DataFrame('',index=x.index, columns=x.columns) df['AMOUNT B'] = np.select([m1],[c1], default='') df['AMOUNT D'] = np.select([m2],[c1], default='') return df writer = pd.ExcelWriter(path to excel) df.style.apply(color,axis=None).to_excel(writer, 'data', index=False)
df before color function:
AMOUNT A AMOUNT B AMOUNT C AMOUNT D 0 100.0 200.0 NaN NaN 1 200.0 200.0 45.0 25.0 2 100.0 500.0 NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 200.0 1.0 NaN NaN
Output after running script:
Expected output:
Advertisement
Answer
The issue here is NaN == NaN
will return False
def color(x): c1 = 'background-color: red' m1 = x['AMOUNT A'].fillna('') != x['AMOUNT B'].fillna('') m2 = x['AMOUNT C'].fillna('') != x['AMOUNT D'].fillna('') df = pd.DataFrame('',index=x.index, columns=x.columns) df['AMOUNT B'] = np.select([m1],[c1], default='') df['AMOUNT D'] = np.select([m2],[c1], default='') return df
More Info
np.nan == np.nan Out[527]: False