Morning All,
Happy Friday. I have some excel output which shows Client
, their Sector
and the counts of their Result
via a crosstab command. This works well:
dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'], 'City': ['SY','LN','NY','TO','TK','LA']} dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients) df = { 'Client': ['AAA', 'AAA', 'AAA', 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB', 'CCC', 'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'], 'Sector': ['GOV', 'GOV', 'CORP', 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP', 'GOV', 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'], 'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ', 'No RFQ', 'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ'] } df = pd.DataFrame.from_dict(df) # print(df) vals = ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject', 'Dealer Timeout', 'Done', 'No RFQ', 'Tied Covered', 'Tied Done', 'Tied Traded Away', 'Traded Away'] df = (pd.crosstab([df.Client, df.Sector], df.Result, margins=True, margins_name='Total_Result_Per_Client') .drop('Total_Result_Per_Client') .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0)) # Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client']) # Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout']) # Populate any nulls due to 0/0 df = df.fillna(0) # Format Pct cols decimals2 = 2 df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%' df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%' print (df) df.to_excel('C:TempOut_Data_EOM_Key_Clients_Corp.xlsx')
The excel extract meets the requirements.
An additional request has come in to add all other possible clients which are not present in the current months data, but may be in futures months.
For each Client
with no data add a row in the crosstab, and for each field insert; #NA
. The final output would be:
I looked to add theses rows via:
# Get list of all possible clients dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client) new_index = tuple(list(dfAll_Clients.All_Client)) print(new_index) # Append clients not present in current row entries dfTemp = df.reindex(new_index, fill_value=0) print(dfTemp)
The issue is that results of the crosstab are multiindexed. I tried flattening the crosstab output with df = df.stack([0]).reset_index()
but this changed the structure completely and deviates from the final output entirely. I now get TypeError: Expected tuple, got str
Any help would be appreciated.
Advertisement
Answer
You can try reindex
#here E and F (l) you can get it by cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0)) l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist() l = [(x, None)for x in l] df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))