Skip to content
Advertisement

Extracting Rows and Appending Them to the End of Other Rows

I have a dataframe containing a transaction list. What I want to do is create something like a P&L per transaction and additionally make it suitable for entry into another software.

Basically, I would like to get a “Buy” transaction and find the next row with a “Sell” transaction. Then take that “Sell” transaction and append it to the end of the “Buy” row. Alternatively, it can be short action: in such case, I’d be taking the “Sell” transaction, search for the next (closest in time) “Buy” transaction, and add it to the end of the “Buy” row.

Input dataframe:

|       |Side|Qty|Price  |Time                     |Commission| ID       |  5min_time                |
|-------|----|---|-------|------------------------------------|--------------------------------------|
|228    |Buy |1  |4087.00|2022-06-10 00:16:00+08:00|0.4       | 727127819|  2022-06-10 00:15:00+08:00|
|229    |Buy |1  |4087.00|2022-06-10 00:16:00+08:00|0.4       | 727127819|  2022-06-10 00:15:00+08:00|
|230    |Buy |1  |4098.75|2022-06-10 00:57:00+08:00|0.4       | 727141021|  2022-06-10 00:55:00+08:00|
|231    |Sell|1  |4094.25|2022-06-10 00:56:00+08:00|0.4       | 727140488|  2022-06-10 00:55:00+08:00|
|232    |Sell|1  |4094.25|2022-06-10 00:56:00+08:00|0.4       | 727140488|  2022-06-10 00:55:00+08:00|
|233    |Buy |1  |4104.25|2022-06-10 00:59:00+08:00|0.4       | 727141284|  2022-06-10 00:55:00+08:00|
|234    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|235    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|236    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|237    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|238    |Buy |1  |4059.75|2022-06-10 03:04:00+08:00|0.4       | 727156591|  2022-06-10 03:00:00+08:00|
|239    |Buy |1  |4059.75|2022-06-10 03:04:00+08:00|0.4       | 727156591|  2022-06-10 03:00:00+08:00|

Desired outcome:

|       |Side|Qty|Price  |Time                     |Commission| ID       |  5min_time                |
|-------|----|---|-------|------------------------------------|--------------------------------------|
|228    |Buy |1  |4087.00|2022-06-10 00:16:00+08:00|0.4       | 727127819|  2022-06-10 00:15:00+08:00|231   |Sell|1  |4094.25|2022-06-10 00:56:00+08:00|0.4       | 727140488|  2022-06-10 00:55:00+08:00|
|229    |Buy |1  |4087.00|2022-06-10 00:16:00+08:00|0.4       | 727127819|  2022-06-10 00:15:00+08:00|232   |Sell|1  |4094.25|2022-06-10 00:56:00+08:00|0.4       | 727140488|  2022-06-10 00:55:00+08:00|
|230    |Buy |1  |4098.75|2022-06-10 00:57:00+08:00|0.4       | 727141021|  2022-06-10 00:55:00+08:00|234   |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|233    |Buy |1  |4104.25|2022-06-10 00:59:00+08:00|0.4       | 727141284|  2022-06-10 00:55:00+08:00|235   |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|
|236    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|238   |Buy |1  |4059.75|2022-06-10 03:04:00+08:00|0.4       | 727156591|  2022-06-10 03:00:00+08:00|
|237    |Sell|1  |4085.75|2022-06-10 01:43:00+08:00|0.4       | 727156519|  2022-06-10 01:40:00+08:00|239   |Buy |1  |4059.75|2022-06-10 03:04:00+08:00|0.4       | 727156591|  2022-06-10 03:00:00+08:00|

The only idea that comes to mind is to loop through all rows and individually append them to the end of others but it is hardly an efficient way. Can someone recommend me a Pythonic way of achieving the desired outcome?

Advertisement

Answer

You can use:

out = df.reset_index().set_index(df.groupby('Side').cumcount())
dup = out.index.duplicated()
out = pd.concat([out[dup], out[~dup]], axis=1)

Output:

>>> out
   index  Side  Qty    Price                       Time  Commission         ID                  5min_time  index  Side  Qty    Price                       Time  Commission         ID                  5min_time
0    231  Sell    1  4094.25  2022-06-10 00:56:00+08:00         0.4  727140488  2022-06-10 00:55:00+08:00    228   Buy    1  4087.00  2022-06-10 00:16:00+08:00         0.4  727127819  2022-06-10 00:15:00+08:00
1    232  Sell    1  4094.25  2022-06-10 00:56:00+08:00         0.4  727140488  2022-06-10 00:55:00+08:00    229   Buy    1  4087.00  2022-06-10 00:16:00+08:00         0.4  727127819  2022-06-10 00:15:00+08:00
2    234  Sell    1  4085.75  2022-06-10 01:43:00+08:00         0.4  727156519  2022-06-10 01:40:00+08:00    230   Buy    1  4098.75  2022-06-10 00:57:00+08:00         0.4  727141021  2022-06-10 00:55:00+08:00
3    235  Sell    1  4085.75  2022-06-10 01:43:00+08:00         0.4  727156519  2022-06-10 01:40:00+08:00    233   Buy    1  4104.25  2022-06-10 00:59:00+08:00         0.4  727141284  2022-06-10 00:55:00+08:00
4    238   Buy    1  4059.75  2022-06-10 03:04:00+08:00         0.4  727156591  2022-06-10 03:00:00+08:00    236  Sell    1  4085.75  2022-06-10 01:43:00+08:00         0.4  727156519  2022-06-10 01:40:00+08:00
5    239   Buy    1  4059.75  2022-06-10 03:04:00+08:00         0.4  727156591  2022-06-10 03:00:00+08:00    237  Sell    1  4085.75  2022-06-10 01:43:00+08:00         0.4  727156519  2022-06-10 01:40:00+08:00

First, we count the occurrence of each side. The first occurrence of “Sell” has index index 0, the second one has index 1 and so on. It’s the same for “Buy” because we use groupby on Side column.

     Side  Idx
228   Buy    0  # 1st occ. of Buy
229   Buy    1  # 2nd occ. of Buy
230   Buy    2  # 3rd occ. of Buy
231  Sell    0  # 1st occ. of Sell
232  Sell    1  # 1st occ. of Sell
233   Buy    3  # 4th occ. of Buy
234  Sell    2  # 3rd occ. of Sell
235  Sell    3  # 4st occ. of Sell
236  Sell    4  # 5st occ. of Sell
237  Sell    5  # 6st occ. of Sell
238   Buy    4  # 5st occ. of Buy
239   Buy    5  # 6st occ. of Buy

Now we have the row index of your new dataframe but we have to align each index of one side to other side (0 -> 0, 1 -> 1, …). As you have 2 sides, we can use duplicated because the first occurrence of 0 is False and the second occurrence of 0 is True. It acts as a boolean mask:

     Side  Idx    Dup
228   Buy    0  False  # Left
229   Buy    1  False  # Left
230   Buy    2  False  # Left
231  Sell    0   True  # Right
232  Sell    1   True  # Right
233   Buy    3  False  # Left
234  Sell    2   True  # Right
235  Sell    3   True  # Right
236  Sell    4  False  # Left
237  Sell    5  False  # Left
238   Buy    4   True  # Right
239   Buy    5   True  # Right
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement