I have a simple script transforming data in a dataframe:
import pandas as pd import numpy as np df = pd.DataFrame({ 'A':[123,None,456], 'B':[3698,598,None]}) def pad_value(item): if item == None or item == np.nan: return None else: return str(item).zfill(7) df['A'] = df['A'].apply(lambda x: pad_value(x)) df['B'] = df['B'].apply(lambda x: pad_value(x))
The above seems to work fine. I have tried rewriting the last two lines to:
cols = ['A', 'B'] df[cols] = df[cols].apply(lambda x: pad_value(x))
However, this fails and gives a value error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
- I am trying to understand why it can’t be used in the above way.
- My pad_value function seems clunky – I wonder if there is a neater way of achieving the same?
Thanks
Advertisement
Answer
First for test missing values or None
use isna
, for elementwise processing use DataFrame.applymap
:
def pad_value(item): if pd.isna(item): return None else: return str(item).zfill(7) cols = ['A', 'B'] df[cols] = df[cols].applymap(pad_value)
With sample data are created floats, here is solution for convert to strings without .0
and NaN
and None
to None
s, last processing Series.str.zfill
(working also with None/NaN
s)
df = pd.DataFrame({ 'A':[123,None,456], 'B':[3698,598,None]}) cols = ['A', 'B'] df[cols] = (df[cols].astype('Int64') .astype(str) .mask(df.isna(), None) .apply(lambda x: x.str.zfill(7)) print (df) A B 0 0000123 0003698 1 None 0000598 2 0000456 None