I need to create a dataframe which lists all patients and their matching doctors.
I have a txt file with doctor/patient records organized in the following format:
Doctor_1: patient23423,patient292837,patient1232423... Doctor_2: patient456785,patient25363,patient23425665...
And a list of all unique patients.
To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: ‘Patients’ filled from the patient list, and ‘Doctors’ column empty.
I then ran the following:
for pat in patientsDF['Patient']: for i, doc in enumerate(doctorsDF[1]): if doctorsDF[1][i].find(str(pat)) >= 0 : patientsDF['Doctor'][i] = doctorsDF.loc[i,0] else: continue
This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!
(First StackOverflow post here. Sorry if this is a newb question!)
Advertisement
Answer
If you use Pandas, try:
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient']) df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',') .explode()).reset_index(drop=True)
Output:
>>> df Doctor Patient 0 Doctor_1 patient23423 1 Doctor_1 patient292837 2 Doctor_1 patient1232423 3 Doctor_2 patient456785 4 Doctor_2 patient25363 5 Doctor_2 patient23425665
How to search:
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze() 'Doctor_2'