Skip to content
Advertisement

Problems with Multiindex from pd.Crosstab when appending non present row data

Morning All,

Happy Friday. I have some excel output which shows Client, their Sector and the counts of their Result via a crosstab command. This works well:

dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
                'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = {  'Client': ['AAA', 'AAA', 'AAA',
                 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
                 'CCC',
                'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
        'Sector': ['GOV', 'GOV', 'CORP',
                 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
                 'GOV',
                 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
        'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
               'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
               'No RFQ',
               'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
      }
df = pd.DataFrame.from_dict(df)
# print(df)

vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']

df = (pd.crosstab([df.Client,
                  df.Sector],
                 df.Result,
                 margins=True,
                 margins_name='Total_Result_Per_Client')
        .drop('Total_Result_Per_Client')
        .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:TempOut_Data_EOM_Key_Clients_Corp.xlsx')

The excel extract meets the requirements.

enter image description here

An additional request has come in to add all other possible clients which are not present in the current months data, but may be in futures months. For each Client with no data add a row in the crosstab, and for each field insert; #NA. The final output would be:

enter image description here

I looked to add theses rows via:

# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)

The issue is that results of the crosstab are multiindexed. I tried flattening the crosstab output with df = df.stack([0]).reset_index() but this changed the structure completely and deviates from the final output entirely. I now get TypeError: Expected tuple, got str

Any help would be appreciated.

Advertisement

Answer

You can try reindex

#here E and F (l) you can get it by 

cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement