I have a dataset of patients, e.g.:
and a dataset of diseases of each patient (by ICD code):
How can I flag each patient if he had history of a specific ICD code, desired output:
I am currently doing it with iteration but this takes too long….
Advertisement
Answer
If need indicators – it means only 0, 1
values use get_dummies
:
df1 = df1.join(pd.get_dummies(df2.set_index('patient_id')['ICD']).max(level=0), on='patient_id')
If need counts ICD
use crosstab
:
df2 = df1.join(pd.crosstab(df['patient_id'], df['ICD']), on='patient_id')
Difference is if duplicates in pairs patient_id
, ICD
.