Skip to content
Advertisement

Apply for loop in multiple dataframe for multiple columns?

Dataframe is like below: Where I want to change dataframes value to ‘dead’ if age is more than 100.

import pandas as pd
raw_data = {'age1': [23,45,210],'age2': [10,20,150],'name': ['a','b','c']}
df = pd.DataFrame(raw_data, columns = ['age1','age2','name'])

raw_data = {'age1': [80,90,110],'age2': [70,120,90],'name': ['a','b','c']}
df2 = pd.DataFrame(raw_data, columns = ['age1','age2','name'])

Desired outcome

df=
    age1    age2    name
0   23      10       a
1   45      20       b
2   dead    dead     c

df2=
    age1    age2    name
0   80      70       a
1   90      dead     b
2   dead    90       c

I was trying something like this:

col_list=['age1','age2']
df_list=[df,df2]

def dead(df):
  for df in df_list:
    if df.columns in col_list:
      if df.columns >=100:
        return 'dead'
    else:
      return df.columns

df.apply(dead)

Error shown: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I am looking for a loop that works on all dataframe.

Please correct my function also for future learning :)

Advertisement

Answer

#inspired by @jib and @ravinder

col_list=['age1','age2']
df_list=[df,df2]

for d in df_list:
  for c in col_list:
    d[c]=np.where(d[c]>100,'dead',d[c])
df #or df2

output:

   age1  age2 name
0    23    10    a
1    45    20    b
2  dead  dead    c
Advertisement