Morning All, Happy Friday. I have some excel output which shows Client, their Sector and the counts of their Result via a crosstab command. This works well: The excel extract meets the requirements. An additional request has come in to add all other possible clients which are not present in the current months data, but may be in futures months.

Problems with Multiindex from pd.Crosstab when appending non present row data

Morning All,

Happy Friday. I have some excel output which shows Client, their Sector and the counts of their Result via a crosstab command. This works well:

dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
                'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = {  'Client': ['AAA', 'AAA', 'AAA',
                 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
                 'CCC',
                'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
        'Sector': ['GOV', 'GOV', 'CORP',
                 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
                 'GOV',
                 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
        'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
               'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
               'No RFQ',
               'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
      }
df = pd.DataFrame.from_dict(df)
# print(df)

vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']

df = (pd.crosstab([df.Client,
                  df.Sector],
                 df.Result,
                 margins=True,
                 margins_name='Total_Result_Per_Client')
        .drop('Total_Result_Per_Client')
        .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:TempOut_Data_EOM_Key_Clients_Corp.xlsx')

JavaScript
​x
 
dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
                'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = {  'Client': ['AAA', 'AAA', 'AAA',
                 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
                 'CCC',
                'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
        'Sector': ['GOV', 'GOV', 'CORP',
                 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
                 'GOV',
                 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
        'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
               'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
               'No RFQ',
               'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
      }
df = pd.DataFrame.from_dict(df)
# print(df)
​
vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']
​
df = (pd.crosstab([df.Client,
                  df.Sector],
                 df.Result,
                 margins=True,
                 margins_name='Total_Result_Per_Client')
        .drop('Total_Result_Per_Client')
        .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:TempOut_Data_EOM_Key_Clients_Corp.xlsx')
​

The excel extract meets the requirements.

An additional request has come in to add all other possible clients which are not present in the current months data, but may be in futures months. For each Client with no data add a row in the crosstab, and for each field insert; #NA. The final output would be:

I looked to add theses rows via:

# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)

JavaScript
 
# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)
​

The issue is that results of the crosstab are multiindexed. I tried flattening the crosstab output with df = df.stack([0]).reset_index() but this changed the structure completely and deviates from the final output entirely. I now get TypeError: Expected tuple, got str

Any help would be appreciated.

Answer

You can try reindex

#here E and F (l) you can get it by 

cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))

JavaScript
 
#here E and F (l) you can get it by 
​
cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))
​

Advertisement

Answer