Skip to content
Advertisement

How to take results of GROUPBY and expand to columns

Here’s an example of my working code that accomplishes what I want but is much too long. The resulting data frame is in the format I want here:

cc_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'Low'))]
cc_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'Medium'))]
cc_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Critical Care') & (df_encounters['Enloe Threshold'] == 'High'))]
ms_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'Low'))]
ms_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'Medium'))]
ms_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Medical Surgical') & (df_encounters['Enloe Threshold'] == 'High'))]
sc_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'Low'))]
sc_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'Medium'))]
sc_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'Specialty Care') & (df_encounters['Enloe Threshold'] == 'High'))]
ns_idx_low = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'Low'))]
ns_idx_med = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'Medium'))]
ns_idx_high = df_encounters.loc[((df_encounters['Specialty Type'] == 'No Specialty') & (df_encounters['Enloe Threshold'] == 'High'))]

cc_low_val = cc_idx_low['Encounter CSN'].nunique()
cc_med_val = cc_idx_med['Encounter CSN'].nunique()
cc_high_val = cc_idx_high['Encounter CSN'].nunique()
ms_low_val = ms_idx_low['Encounter CSN'].nunique()
ms_med_val = ms_idx_med['Encounter CSN'].nunique()
ms_high_val = ms_idx_high['Encounter CSN'].nunique()
sc_low_val = sc_idx_low['Encounter CSN'].nunique()
sc_med_val = sc_idx_med['Encounter CSN'].nunique()
sc_high_val = sc_idx_high['Encounter CSN'].nunique()
ns_low_val = ns_idx_low['Encounter CSN'].nunique()
ns_med_val = ns_idx_med['Encounter CSN'].nunique()
ns_high_val = ns_idx_high['Encounter CSN'].nunique()

# building the specialty range list using the values from above
df_encounter_counts=[['Critical Care',cc_low_val,cc_med_val,cc_high_val],
           ['Medical Surgical',ms_low_val,ms_med_val,ms_high_val],
           ['Specialty Care',sc_low_val,sc_med_val,sc_high_val],
           ['No Specialty',ns_low_val,ns_med_val,ns_high_val]]

# creating the ranges data frame for the list of values
df_encounter_counts = pd.DataFrame(df_encounter_counts,columns=['Specialty','Low','Medium','High'])
df_encounter_counts

I know that I can accomplish close to what I want with a groupby() on the ‘Specialty Type’ and the ‘Foundation Threshold’ using an .nunique() on the ‘Encounter CSN’ (count of unique encounters per specialty type and threshold:

Example Data: Three columns of values

df_encounters = df[['Encounter CSN','Specialty Type','Foundation Threshold']].copy()
df_enc_totals = df_encounters.groupby(['Specialty Type','Foundation Threshold'])['Encounter CSN'].nunique()

Specialty Type    Foundation Threshold
Critical Care     High                     271
                  Low                      165
                  Medium                   439
Medical Surgical  High                     245
                  Low                     2307
                  Medium                  2724
No Specialty      High                      10
                  Low                     1277
                  Medium                   472
Specialty Care    High                     214
                  Low                      906
                  Medium                  1205
Name: Encounter CSN, dtype: int64

I’d like to sort the “Low”, “Medium”, and “High” under their own columns with the count of unique encounters underneath like so:

Specialty Type      Low      Medium    High
Critical Care       165      439       271
Medical Surgical    2307     2724      245  
Specialty Care      906      1205      214  
No Specialty        1277     472       10

Edit: Unstacked “Foundation Threshold” Error trying to format

Final Answer (combination of marked answer and mine):

df_encounters = df[['Encounter CSN','Specialty Type','Enloe Threshold']].copy()
df_enc_totals = df_encounters.groupby(['Specialty Type','Enloe Threshold'])['Encounter CSN'].nunique()
df_enc_count = df_enc_totals.unstack('Enloe Threshold')
df_enc_count['Specialty'] = df_enc_count.index
df_enc_count.columns.name = ''
df_enc_count.index = range(0, 4, 1)
df_enc_count[['Specialty','Low','Medium','High']]```

Advertisement

Answer

Here is your dataframe:

df_enc_totals = pd.DataFrame({'': {('Critical Care', 'High'): 271,
  ('Critical Care', 'Low'): 165,
  ('Critical Care', 'Medium'): 439,
  ('Medical Surgical', 'High'): 245,
  ('Medical Surgical', 'Low'): 2307,
  ('Medical Surgical', 'Medium'): 2724,
  ('No Specialty', 'High'): 10,
  ('No Specialty', 'Low'): 1277,
  ('No Specialty', 'Medium'): 472,
  ('Specialty Care', 'High'): 214,
  ('Specialty Care', 'Low'): 906,
  ('Specialty Care', 'Medium'): 1205}}
)

df_enc_totals.index.names = ['Specialty Type', 'Foundation Threshold']
df_enc_totals

Please try this:

df_enc_totals = df_enc_totals.unstack('Foundation Threshold')
df_enc_totals.columns = df_enc_totals.columns.droplevel()
df_enc_totals = df_enc_totals[['Low', 'Medium','High']]
df_enc_totals.columns.name = ''
df_enc_totals.reset_index(inplace = True)
df_enc_totals
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement