I have written the following code however I am unable to understand how to name the rsuffix and lsuffix parameters
dfs_list = [] for cycle in email_df.cycle_end_date.unique(): temp = email_df[email_df.cycle_end_date == cycle].transpose() .join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash') .join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash') .join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf') .join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff') dfs_list.append(temp)
All my dfs have same column names
example:
cycle_end_date | triggered | delivered | cost | payment_value | delivery% 2021-15-01 | 34 | 32 | 4 | 7899 | 5% 2021-31-01 | 45 | 49 | 8 | 1500 | 4%
When I am printing dfs_list[2].reset_index()
I do get my expected output but I am unable to comprehend the suffix names. How do we define it?
output:
**index | 2email | 1lsuff | 2flash | 2 | 1rsuff** 0 absolute_cost 3.00 9.40 9.40 0.00 6.00 1 bill_paid_percent 3.28 0.33 1.87 68139.72 0.28 2 bill_paid_using_reminder 21.20 0.70 9.45 1.78 0.64 3 bounced_email 5018 NaN NaN NaN NaN 4 clicked_email 13385 NaN NaN NaN NaN 5 cycle_end_date 2022-02-28 2022-02-28 2022-02-28 2022-02-28 2022-02-28
Can someone throw light on how to name the suffix to get the exact numbers of the dataframe considered?
Advertisement
Answer
lsuffix
and rsuffix
only work when there are overlapping columns between joined dataframes.
Let’s look at the continous join
in your script
temp = email_df[email_df.cycle_end_date == cycle].transpose() .join(flash_df[flash_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash') .join(sms_df[sms_df.cycle_end_date == cycle].transpose(), how='outer', lsuffix='email', rsuffix='flash') .join(upi_df[upi_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuf', rsuffix='rsuf') .join(ivr_df[ivr_df.cycle_end_date==cycle].transpose(),how='outer',lsuffix='lsuff', rsuffix='rsuff')
Since you do a transpose
to all your joined dataframe, so actually you are using dataframe where column names are original index.
1 cycle_end_date 2021-31-01 triggered 45 delivered 49 cost 8 payment_value 1500 delivery% 4%
After you join email_df
with flash_df
, the 1
might overlap, so the joined df is
1email 1flash cycle_end_date 2021-15-01 2021-15-01 triggered 34 34 delivered 32 32 cost 4 4 payment_value 7899 7899 delivery% 5% 5%
In the next join with sms_df
whose column name is index doesn’t overlap with above joined df, so the output might like
0email 0flash 1 cycle_end_date 2021-15-01 2021-15-01 2021-15-01 triggered 34 34 34 delivered 32 32 32 cost 4 4 4 payment_value 7899 7899 7899 delivery% 5% 5% 5%
And this process goes on…