I have a dataframe df:
| name | age_5_9 | age_10_14 | age_15_19 | 
|---|---|---|---|
| Alice | no bones broken | no bones broken | broke 1 bone | 
| Bob | no bones broken | broke 2 bones | no bones broken | 
| Charles | no bones broken | no bones broken | no bones broken | 
I would like to create a column broke_a_bone that is 1 when any of the rows has a value ‘broke 1 bone’ or ‘broke 2 bones’ in any of the columns age_5_9, age_10_14, or age_15_19; otherwise it should be 0.
It should look like this:
| name | age_5_9 | age_10_14 | age_15_19 | broke_a_bone | 
|---|---|---|---|---|
| Alice | no bones broken | no bones broken | broke 1 bone | 1 | 
| Bob | no bones broken | broke 2 bones | no bones broken | 1 | 
| Charles | no bones broken | no bones broken | no bones broken | 0 | 
I tried to use .iterrows or .apply() but I just can’t seem to make it work.
Thanks in advance.
Advertisement
Answer
You can use filter to select the “age” columns, then check if any value per row is not (ne) “no bones broken”.
Convert the resulting boolean to integer for 0/1:
df['broke_a_bone'] = (df.filter(like='age_').ne('no bones broken')
                        .any(axis=1).astype(int)
                     )
output:
name age_5_9 age_10_14 age_15_19 broke_a_bone 0 Alice no bones broken no bones broken broke 1 bone 1 1 Bob no bones broken broke 2 bones no bones broken 1 2 Charles no bones broken no bones broken no bones broken 0