Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 1 year ago. Improve this question I need to create a dataframe which lists all patients and their matching doctors. I have a txt file with

How to find which doctor a patient is using, when only given a list of doctor’s patients? (code improvement request) [closed]

I need to create a dataframe which lists all patients and their matching doctors.

I have a txt file with doctor/patient records organized in the following format:

Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...

JavaScript
​x
 
Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...
​

And a list of all unique patients.

To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: ‘Patients’ filled from the patient list, and ‘Doctors’ column empty.

I then ran the following:

for pat in patientsDF['Patient']:
    for i, doc in enumerate(doctorsDF[1]):
        if doctorsDF[1][i].find(str(pat)) >= 0 :
            patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
        else: 
            continue

JavaScript
 
for pat in patientsDF['Patient']:
    for i, doc in enumerate(doctorsDF[1]):
        if doctorsDF[1][i].find(str(pat)) >= 0 :
            patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
        else: 
            continue
​

This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!

(First StackOverflow post here. Sorry if this is a newb question!)

Answer

If you use Pandas, try:

df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
                                      .explode()).reset_index(drop=True)

JavaScript
 
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
                                      .explode()).reset_index(drop=True)
​

Output:

>>> df

     Doctor          Patient
0  Doctor_1     patient23423
1  Doctor_1    patient292837
2  Doctor_1   patient1232423
3  Doctor_2    patient456785
4  Doctor_2     patient25363
5  Doctor_2  patient23425665

JavaScript
 
>>> df
​
     Doctor          Patient
0  Doctor_1     patient23423
1  Doctor_1    patient292837
2  Doctor_1   patient1232423
3  Doctor_2    patient456785
4  Doctor_2     patient25363
5  Doctor_2  patient23425665
​

How to search:

>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'

JavaScript
 
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'
​

Advertisement

Answer