Skip to content
Advertisement

Operation between 2 arrays for many rows based on date

I have a dataset df_1 that looks like this:

date stock A stock B stock C stock D
2020-11-01 4 8 14 30
2020-11-10 0.4 0.6 0.8 0.2
2020-11-30 6 10 20 35
2020-12-01 6 10 20 35
2020-11-31 8 12 25 0.1

And a second dataset, df_2:

date output1 output2
11/2020 stock A,stock B stock C, stock D
12/2020 stock B,stock D stock A,stock C

I want to, for every output in df_2, compute the return for each month. An expected result for this example is df_3:

date output1 output2
11/2020 stock A: 50%, stock B: 25% stock C:42.8% , stock D: 16.6%
12/2020 stock B: 20% ,stock D: 14.3% stock A: 33.3% , stock C: 25%

In the df_3, I do not need the result to be displayed with a comma separator, however, I need later to be able to plot everything in a graph with time on the X axis and stock values (from df_3) on the Y axis.

Update 1: As requested, here is the constructor of my sample dataframes: df1:

{'PETROLEO BRASILEIRO ON': {Timestamp('1990-01-10 00:00:00'): 0.0, Timestamp('1990-01-11 00:00:00'): 0.0, Timestamp('1990-01-12 00:00:00'): 0.0, Timestamp('1990-01-15 00:00:00'): 0.0}, 'WEG ON': {Timestamp('1990-01-10 00:00:00'): nan, Timestamp('1990-01-11 00:00:00'): nan, Timestamp('1990-01-12 00:00:00'): nan, Timestamp('1990-01-15 00:00:00'): nan}, 'BANCO SANTANDER BRASIL UNITS': {Timestamp('1990-01-10 00:00:00'): nan, Timestamp('1990-01-11 00:00:00'): nan, Timestamp('1990-01-12 00:00:00'): nan, Timestamp('1990-01-15 00:00:00'): nan}, 'ITAU UNIBANCO HOLDING PN': {Timestamp('1990-01-10 00:00:00'): nan, Timestamp('1990-01-11 00:00:00'): nan, Timestamp('1990-01-12 00:00:00'): nan, Timestamp('1990-01-15 00:00:00'): nan}}

df2: obs: “losers” stands for “output1”

{'losers': {Period('1992-08', 'M'): ['AGRALE ON 1000', 'SIBRA ON', 'IGB ELECTRONICA ON', 'CMPH.ENGT.DE SAOP.A PN', 'INEPAR ON', 'ANTARCTIC MG PN', 'BRASPEROLA ON 1000', 'DHB INDUSTRIA E COMERCIO ON', 'FERRO BRASILEIRO', 'GRANOLEO PN 1000', 'ITACOLOMY PNB 1000', 'SID RIOGRAND PN', 'ANTARCTIC PB ON 1000', 'BUETTNER ON', 'COFAP ON', 'MAIO GALLO ON 1000', 'NAKATA ON 1000', 'SAM INDUSTR ON', 'SID RIOGRAND ON', 'TECELAGEM BLUMENAU SERIES B PN', 'TELEMAR N L PNB', 'TRANSBRASIL ON', 'AQUATEC QUIMICA PN', 'ANTARCTIC PB PNA 1000', 'POLIALDEN PN 1000', 'MICHELETTO ON 1000', 'BRAHMA PN 1000', 'ELUMA PN', 'TELESP ON 1000', 'TELESP PN 1000', 'BLUE TECH SOLUTIONS EQI S ON', 'CACIQUE CAFE PN', 'SALGEMA PNA', 'ANTARCT NORD PNA 1000', 'TEKA PN', 'ANTARC PIAUI PNB 1000', 'BRAHMA ON 1000', 'CTI.ELETR BRAS- ELETROBRAS SR.A PN', 'NACIONAL ON 1000', 'ANTARCTICA ON', 'ARNO PN', 'TEL B CAMPO ON 1000', 'MANNESMANN PN 1000', 'BANCO DO NORD ON', 'MANGELS PN', 'AGROCERES PN 1000', 'CHAPECO PN', 'COINVEST PN 1000', 'MENDES JUNIOR ENGENHARIA SERIES B PN', 'BANDEIRANTES ON 1000', 'MENDES JUNIOR ENGENHARIA SERIES A PN', 'METAL LEVE PN', 'NORD BRASIL PN', 'PARANAPANEMA PN'...

Update 2: I found another way around, we can use df1 to compute the monthly returns, and I would end up with this:

df1 = (np.log(df1)) - (np.log(df1.shift(1)))   
df1.reset_index(level=0, inplace=True)
per = df1.date.dt.to_period("M")
g = df1.groupby(per)
df1 = g.sum(min_count=1) # monthly return table`
date stock A Stock B
11/2020 0.5 0.25
12/2020 0.33% 0.20

Now I need to combine this df with df2 for the desired output table, which must be easier.

Advertisement

Answer

Assuming the index of df1 is a DatetimeIndex and df2 a PeriodIndex. So suppose the following dataframes:

import pandas as pd
from pandas import Timestamp, Period

data1 = {'index': [Timestamp('2020-11-01 00:00:00'), Timestamp('2020-11-10 00:00:00'), Timestamp('2020-11-30 00:00:00'), Timestamp('2020-12-01 00:00:00'), Timestamp('2020-12-31 00:00:00')], 
         'columns': ['stock A', 'stock B', 'stock C', 'stock D'], 
         'data': [[4.0, 8.0, 14.0, 30.0], [0.4, 0.6, 0.8, 0.2], [6.0, 10.0, 20.0, 35.0], [6.0, 10.0, 20.0, 35.0], [8.0, 12.0, 25.0, 0.1]]}

df1 = pd.DataFrame(**data1).rename_axis('date')

data2 = {'index': [Period('2020-11', 'M'), Period('2020-12', 'M')], 
         'columns': ['output1', 'output2'],
         'data': [['stock A,stock B', 'stock C, stock D'], ['stock B,stock D', 'stock A,stock C']]}

df2 = pd.DataFrame(**data2).rename_axis('date')

First, clean your first dataframe:

# Compute percentage change
pct = lambda x: x.iloc[[0, -1]].pct_change().iloc[1] * 100
df1 = df1.groupby(pd.Grouper(freq='M')).apply(pct)

# Reshape your dataframe
df1 = df1.melt(var_name='stock', value_name='pct', ignore_index=False) 
         .to_period('M').reset_index()

At this point, your first dataframe looks like:

>>> df1
      date    stock        pct
0  2020-11  stock A  50.000000
1  2020-12  stock A  33.333333
2  2020-11  stock B  25.000000
3  2020-12  stock B  20.000000
4  2020-11  stock C  42.857143
5  2020-12  stock C  25.000000
6  2020-11  stock D  16.666667
7  2020-12  stock D -99.714286

Now, reshape your second dataframe:

# Reshape your dataframe after splitting stocks
df2 = df2.apply(lambda x: x.str.split(', ?')) 
         .melt(var_name='output', value_name='stock', ignore_index=False) 
         .explode('stock').reset_index()

At this point your second dataframe looks like:

>>> df2
      date   output    stock
0  2020-11  output1  stock A
1  2020-11  output1  stock B
2  2020-12  output1  stock B
3  2020-12  output1  stock D
4  2020-11  output2  stock C
5  2020-11  output2  stock D
6  2020-12  output2  stock A
7  2020-12  output2  stock C

Finally, merge your dataframes together:

# Join your dataframes on date and stock columns
df3 = df2.merge(df1, on=['date', 'stock'], how='left')

# Compute some columns
df3['fmt'] = df3['stock'] + ': ' + df3['pct'].round(1).astype(str) + '%'

# Reshape your dataframe to get the final output
df3 = df3.pivot_table('fmt', 'date', 'output', aggfunc=', '.join) 
         .rename_axis(columns=None)

The final output:

                                 output1                         output2
date                                                                    
2020-11   stock A: 50.0%, stock B: 25.0%  stock C: 42.9%, stock D: 16.7%
2020-12  stock B: 20.0%, stock D: -99.7%  stock A: 33.3%, stock C: 25.0%
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement