I have a dataframe df
:
name | age_5_9 | age_10_14 | age_15_19 |
---|---|---|---|
Alice | no bones broken | no bones broken | broke 1 bone |
Bob | no bones broken | broke 2 bones | no bones broken |
Charles | no bones broken | no bones broken | no bones broken |
I would like to create a column broke_a_bone
that is 1
when any of the rows has a value ‘broke 1 bone’ or ‘broke 2 bones’ in any of the columns age_5_9
, age_10_14
, or age_15_19
; otherwise it should be 0.
It should look like this:
name | age_5_9 | age_10_14 | age_15_19 | broke_a_bone |
---|---|---|---|---|
Alice | no bones broken | no bones broken | broke 1 bone | 1 |
Bob | no bones broken | broke 2 bones | no bones broken | 1 |
Charles | no bones broken | no bones broken | no bones broken | 0 |
I tried to use .iterrows
or .apply()
but I just can’t seem to make it work.
Thanks in advance.
Advertisement
Answer
You can use filter
to select the “age” columns, then check if any
value per row is not (ne
) “no bones broken”.
Convert the resulting boolean to integer for 0/1:
df['broke_a_bone'] = (df.filter(like='age_').ne('no bones broken') .any(axis=1).astype(int) )
output:
name age_5_9 age_10_14 age_15_19 broke_a_bone 0 Alice no bones broken no bones broken broke 1 bone 1 1 Bob no bones broken broke 2 bones no bones broken 1 2 Charles no bones broken no bones broken no bones broken 0