Sorry for the noob question. I have a bunch of stocks in a sqlite3 database:
import pandas as pd
import sqlite3, config
connection = sqlite3.connect(config.db_file)
connection.row_factory = sqlite3.Row
df = pd.read_sql('SELECT * FROM stock_price', connection)
# sort the dataframe
df.sort_values(by='stock_id', inplace=True)
# # set the index to be this and don't drop
df.set_index(keys=['stock_id'], drop=False,inplace=True)
When I print the df, it gives me the following (where each stock_id refers to a unique stock, e.g APPL):
id stock_id date open high low close volume stock_id 1 1 1 2022-08-02 9.83 9.845 9.83 9.830 584772 1 2 1 2022-08-03 9.84 9.860 9.84 9.820 7711 4 3 4 2022-08-03 10.38 10.380 10.38 10.380 199 5 46 5 2022-08-03 34.75 35.200 34.75 35.200 1007 5 45 5 2022-08-02 34.32 34.550 34.32 34.442 1252 ... ... ... ... ... ... ... ... ... 98 8 98 2022-08-02 28.00 28.095 27.90 28.000 2417 99 71 99 2022-08-02 88.19 88.940 87.15 88.370 1045596 99 72 99 2022-08-03 88.34 88.550 87.65 88.410 982710 100 171 100 2022-08-02 117.58 120.010 117.08 119.270 67795 100 172 100 2022-08-03 119.80 121.940 120.60 121.440 4237 [178 rows x 8 columns]
I need to target each unique stock_id individually, and get the prior close.
I know if each stock was in its own separate dataframe, I could do something like this:
final_df['previous close'] = final_df['c'].shift()
But when I’ve tried that, because everything in one dataframe, then you get one stock getting the previous close of an entirely different stock, which isn’t what I want.
So my question:
What’s the best to achieve splitting out all these different stocks from one single dataframe and being able to target them individually, and get the previous close price of each stock?
Advertisement
Answer
If I understand the question correctly, you just want the close of the previous row within each stock group. You can do that easily:
newdf = df.assign(prev_close=df.groupby(level=0)['close'].shift())
or, before setting the index to stock_id:
newdf = df.assign(prev_close=df.groupby('stock_id')['close'].shift())
On your data sample:
>>> newdf
          id   stock_id date         open    high     low     close    volume   prev_close
stock_id                                                                                  
1           1    1       2022-08-02    9.83    9.845    9.83    9.830   584772     NaN    
1           2    1       2022-08-03    9.84    9.860    9.84    9.820     7711    9.83    
4           3    4       2022-08-03   10.38   10.380   10.38   10.380      199     NaN    
5          46    5       2022-08-03   34.75   35.200   34.75   35.200     1007     NaN    
5          45    5       2022-08-02   34.32   34.550   34.32   34.442     1252   35.20    
98          8   98       2022-08-02   28.00   28.095   27.90   28.000     2417     NaN    
99         71   99       2022-08-02   88.19   88.940   87.15   88.370  1045596     NaN    
99         72   99       2022-08-03   88.34   88.550   87.65   88.410   982710   88.37    
100       171  100       2022-08-02  117.58  120.010  117.08  119.270    67795     NaN    
100       172  100       2022-08-03  119.80  121.940  120.60  121.440     4237  119.27
Side note: try to not use inplace=True all over the place. It will cause you trouble in the long run (see e.g. here).
Personally, I would prefer this format:
cleandf = df.assign(
    date=pd.to_datetime(df['date'])
).set_index(['stock_id', 'date']).sort_index()
newdf = cleandf.assign(prev_close=cleandf.groupby('stock_id')['close'].shift())
>>> newdf
                     id   open    high     low     close    volume   prev_close
stock_id date                                                                  
1        2022-08-02    1    9.83    9.845    9.83    9.830   584772      NaN   
         2022-08-03    2    9.84    9.860    9.84    9.820     7711    9.830   
4        2022-08-03    3   10.38   10.380   10.38   10.380      199      NaN   
5        2022-08-02   45   34.32   34.550   34.32   34.442     1252      NaN   
         2022-08-03   46   34.75   35.200   34.75   35.200     1007   34.442   
98       2022-08-02    8   28.00   28.095   27.90   28.000     2417      NaN   
99       2022-08-02   71   88.19   88.940   87.15   88.370  1045596      NaN   
         2022-08-03   72   88.34   88.550   87.65   88.410   982710   88.370   
100      2022-08-02  171  117.58  120.010  117.08  119.270    67795      NaN   
         2022-08-03  172  119.80  121.940  120.60  121.440     4237  119.270