Skip to content
Advertisement

Python Pandas: How to get previous dates with a condition [closed]

I have dataset with the following columns:

  • encounterDate
  • FormName
  • PersonID

In the FormName, I have the following forms:

  1. Baseline
  2. Follow-up

Here’s a sample data:

encounterDate, FormName, PersonID
2019-01-12, Baseline, 01
2020-01-01, Baseline, 01
2019-04-12, Follow-up, 01
2019-13-12, Follow-up, 01
2020-15-01, Follow-up, 01

I would like to have the following table:

encounterDate, FormName, PersonID, Previous_date
2019-01-12, Baseline, 01, null
2020-01-01, Baseline, 01, 2019-01-12
2019-04-12, Follow-up, 01, null
2019-13-12, Follow-up, 01, 2019-04-12
2020-15-01, Follow-up, 01, 2019-13-12

How do I write this code in Python?

Additionally, I would like to also rank them:

encounterDate, FormName, PersonID, Previous_date, Rank
2019-01-12, Baseline, 01, null, 1
2020-01-01, Baseline, 01, 2019-01-12, 2
2019-04-12, Follow-up, 01, null, 1
2019-13-12, Follow-up, 01, 2019-04-12, 2
2020-15-01, Follow-up, 01, 2019-13-12, 3

Here’s my working code in SQL

select encounter_date,FormName,PersonID
, date((select max(enc.encounter_datetime) 
                from encounter enc 
                where enc.patient_id=e.patient_id 
                        and enc.encounter_type=e.encounter_type 
                        and date(e.encounter_datetime)>date(enc.encounter_datetime))) previous_date

from encounter e

Thank you in advance.

John

Advertisement

Answer

This would be fairly straight forward in pandas

It looks like you need to groupby both the PersonID and FormName to get the proper groupings. Within those groups you need to shift encounterDate and you need a cumulative count of the same.

cumcount starts at zero, so you may want to add 1 to the rank column to get the desired output.

import pandas as pd

df = pd.DataFrame({
    'encounterDate': ['2019-01-12','2020-01-01','2019-04-12','2019-13-12','2020-15-01'],
    'FormName': ['Baseline','Baseline','Follow-up','Follow-up','Follow-up'],
    'PersonID': [1, 1, 1, 1, 1]
})

df[['Previous_date',
    'Rank']] = df.groupby(['PersonID',
                           'FormName']).agg(Previous_date=('encounterDate','shift'),
                                            Rank=('encounterDate','cumcount'))

df['Rank']+=1

Output

  encounterDate   FormName  PersonID Previous_date  Rank
0    2019-01-12   Baseline         1           NaN     1
1    2020-01-01   Baseline         1    2019-01-12     2
2    2019-04-12  Follow-up         1           NaN     1
3    2019-13-12  Follow-up         1    2019-04-12     2
4    2020-15-01  Follow-up         1    2019-13-12     3
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement