Dataframe is like below: Where I want to change dataframes value to ‘dead’ if age is more than 100.
import pandas as pd raw_data = {'age1': [23,45,210],'age2': [10,20,150],'name': ['a','b','c']} df = pd.DataFrame(raw_data, columns = ['age1','age2','name']) raw_data = {'age1': [80,90,110],'age2': [70,120,90],'name': ['a','b','c']} df2 = pd.DataFrame(raw_data, columns = ['age1','age2','name'])
Desired outcome
df= age1 age2 name 0 23 10 a 1 45 20 b 2 dead dead c df2= age1 age2 name 0 80 70 a 1 90 dead b 2 dead 90 c
I was trying something like this:
col_list=['age1','age2'] df_list=[df,df2] def dead(df): for df in df_list: if df.columns in col_list: if df.columns >=100: return 'dead' else: return df.columns df.apply(dead)
Error shown: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am looking for a loop that works on all dataframe.
Please correct my function also for future learning :)
Advertisement
Answer
#inspired by @jib and @ravinder
col_list=['age1','age2'] df_list=[df,df2] for d in df_list: for c in col_list: d[c]=np.where(d[c]>100,'dead',d[c]) df #or df2
output:
age1 age2 name 0 23 10 a 1 45 20 b 2 dead dead c