I have dataset with the following columns:
- encounterDate
- FormName
- PersonID
In the FormName, I have the following forms:
- Baseline
- Follow-up
Here’s a sample data:
JavaScript
x
7
1
encounterDate, FormName, PersonID
2
2019-01-12, Baseline, 01
3
2020-01-01, Baseline, 01
4
2019-04-12, Follow-up, 01
5
2019-13-12, Follow-up, 01
6
2020-15-01, Follow-up, 01
7
I would like to have the following table:
JavaScript
1
7
1
encounterDate, FormName, PersonID, Previous_date
2
2019-01-12, Baseline, 01, null
3
2020-01-01, Baseline, 01, 2019-01-12
4
2019-04-12, Follow-up, 01, null
5
2019-13-12, Follow-up, 01, 2019-04-12
6
2020-15-01, Follow-up, 01, 2019-13-12
7
How do I write this code in Python?
Additionally, I would like to also rank them:
JavaScript
1
7
1
encounterDate, FormName, PersonID, Previous_date, Rank
2
2019-01-12, Baseline, 01, null, 1
3
2020-01-01, Baseline, 01, 2019-01-12, 2
4
2019-04-12, Follow-up, 01, null, 1
5
2019-13-12, Follow-up, 01, 2019-04-12, 2
6
2020-15-01, Follow-up, 01, 2019-13-12, 3
7
Here’s my working code in SQL
JavaScript
1
9
1
select encounter_date,FormName,PersonID
2
, date((select max(enc.encounter_datetime)
3
from encounter enc
4
where enc.patient_id=e.patient_id
5
and enc.encounter_type=e.encounter_type
6
and date(e.encounter_datetime)>date(enc.encounter_datetime))) previous_date
7
8
from encounter e
9
Thank you in advance.
John
Advertisement
Answer
This would be fairly straight forward in pandas
It looks like you need to groupby
both the PersonID
and FormName
to get the proper groupings. Within those groups you need to shift encounterDate
and you need a cumulative count of the same.
cumcount
starts at zero, so you may want to add 1 to the rank column to get the desired output.
JavaScript
1
15
15
1
import pandas as pd
2
3
df = pd.DataFrame({
4
'encounterDate': ['2019-01-12','2020-01-01','2019-04-12','2019-13-12','2020-15-01'],
5
'FormName': ['Baseline','Baseline','Follow-up','Follow-up','Follow-up'],
6
'PersonID': [1, 1, 1, 1, 1]
7
})
8
9
df[['Previous_date',
10
'Rank']] = df.groupby(['PersonID',
11
'FormName']).agg(Previous_date=('encounterDate','shift'),
12
Rank=('encounterDate','cumcount'))
13
14
df['Rank']+=1
15
Output
JavaScript
1
7
1
encounterDate FormName PersonID Previous_date Rank
2
0 2019-01-12 Baseline 1 NaN 1
3
1 2020-01-01 Baseline 1 2019-01-12 2
4
2 2019-04-12 Follow-up 1 NaN 1
5
3 2019-13-12 Follow-up 1 2019-04-12 2
6
4 2020-15-01 Follow-up 1 2019-13-12 3
7