Operation between 2 arrays for many rows based on date

Question

I have a dataset df_1 that looks like this: date stock A stock B stock C stock D 2020-11-01 4 8 14 30 2020-11-10 0.4 0.6 0.8 0.2 2020-11-30 6 10 20 35 2020-12-01 6 10 20 35 2020-11-31 8 12 25 0.1 And a second dataset, df_2: date output1 output2 11/2020 stock A,stock B stock C, stock D 12/2020

Accepted Answer

Assuming the index of df1 is a DatetimeIndex and df2 a PeriodIndex. So suppose the following dataframes:import pandas as pdfrom pandas import Timestamp, Perioddata1 = {'index': [Timestamp('2020-11-01 00:00:00'), Timestamp('2020-11-10 00:00:00'), Timestamp('2020-11-30 00:00:00'), Timestamp('2020-12-01 00:00:00'), Timestamp('2020-12-31 00:00:00')],          'columns': ['stock A', 'stock B', 'stock C', 'stock D'],          'data': [[4.0, 8.0, 14.0, 30.0], [0.4, 0.6, 0.8, 0.2], [6.0, 10.0, 20.0, 35.0], [6.0, 10.0, 20.0, 35.0], [8.0, 12.0, 25.0, 0.1]]}df1 = pd.DataFrame(**data1).rename_axis('date')data2 = {'index': [Period('2020-11', 'M'), Period('2020-12', 'M')],          'columns': ['output1', 'output2'],         'data': [['stock A,stock B', 'stock C, stock D'], ['stock B,stock D', 'stock A,stock C']]}df2 = pd.DataFrame(**data2).rename_axis('date')First, clean your first dataframe:# Compute percentage changepct = lambda x: x.iloc[[0, -1]].pct_change().iloc[1] * 100df1 = df1.groupby(pd.Grouper(freq='M')).apply(pct)# Reshape your dataframedf1 = df1.melt(var_name='stock', value_name='pct', ignore_index=False)          .to_period('M').reset_index()At this point, your first dataframe looks like:>>> df1      date    stock        pct0  2020-11  stock A  50.0000001  2020-12  stock A  33.3333332  2020-11  stock B  25.0000003  2020-12  stock B  20.0000004  2020-11  stock C  42.8571435  2020-12  stock C  25.0000006  2020-11  stock D  16.6666677  2020-12  stock D -99.714286Now, reshape your second dataframe:# Reshape your dataframe after splitting stocksdf2 = df2.apply(lambda x: x.str.split(', ?'))          .melt(var_name='output', value_name='stock', ignore_index=False)          .explode('stock').reset_index()At this point your second dataframe looks like:>>> df2      date   output    stock0  2020-11  output1  stock A1  2020-11  output1  stock B2  2020-12  output1  stock B3  2020-12  output1  stock D4  2020-11  output2  stock C5  2020-11  output2  stock D6  2020-12  output2  stock A7  2020-12  output2  stock CFinally, merge your dataframes together:# Join your dataframes on date and stock columnsdf3 = df2.merge(df1, on=['date', 'stock'], how='left')# Compute some columnsdf3['fmt'] = df3['stock'] + ': ' + df3['pct'].round(1).astype(str) + '%'# Reshape your dataframe to get the final outputdf3 = df3.pivot_table('fmt', 'date', 'output', aggfunc=', '.join)          .rename_axis(columns=None)The final output:                                 output1                         output2date                                                                    2020-11   stock A: 50.0%, stock B: 25.0%  stock C: 42.9%, stock D: 16.7%2020-12  stock B: 20.0%, stock D: -99.7%  stock A: 33.3%, stock C: 25.0%

date	stock A	Stock B
11/2020	0.5	0.25
12/2020	0.33%	0.20

date	stock A	stock B	stock C	stock D
2020-11-01	4	8	14	30
2020-11-10	0.4	0.6	0.8	0.2
2020-11-30	6	10	20	35
2020-12-01	6	10	20	35
2020-11-31	8	12	25	0.1

date	output1	output2
11/2020	stock A,stock B	stock C, stock D
12/2020	stock B,stock D	stock A,stock C

date	output1	output2
11/2020	stock A: 50%, stock B: 25%	stock C:42.8% , stock D: 16.6%
12/2020	stock B: 20% ,stock D: 14.3%	stock A: 33.3% , stock C: 25%

Advertisement

Answer