Skip to content
Advertisement

Have only 1 record per date in a pandas dataframe

Background: In mplfinance, I want to be able to plot multiple trade markers in the same bar. Currently to my understanding you can add only 1 (or 1 buy and 1 sell) to the same bar. I cannot have 2 more trades on the same side in the same bar unless I create another series.

Here is an example:

d = {'TradeDate': ['2018-10-15 06:00:00',
                   '2018-10-29 03:00:00',
                   '2018-10-29 03:00:00',
                   '2018-10-29 06:00:00',
                   '2018-11-15 05:00:00',
                   '2018-11-15 05:00:00',
                   '2018-11-15 05:00:00'],
     'Price': [1.1596,
               1.1433,
               1.13926,
               1.14015,
               1.1413,
               1.1400,
               1.1403]}
df = pd.DataFrame(data=d)

df

    TradeDate           Price
0   2018-10-15 06:00:00 1.15960
1   2018-10-29 03:00:00 1.14330
2   2018-10-29 03:00:00 1.13926
3   2018-10-29 06:00:00 1.14015
4   2018-11-15 05:00:00 1.14130
5   2018-11-15 05:00:00 1.14000
6   2018-11-15 05:00:00 1.14030

As you can see there are multiple trades for 2 datetimes. Now I would like to apply a rule that says “If there is more than 1 trade(here: Price) per date, create a new column for the additional price, keep doing so until all prices for the same TradeDate (datetime) have been distributed across columns, and all datetimes are unique”. So the more prices for the same date, the more extra columns are needed.

The end result would look like this (I finagled this data manually):

    TradeDate           Price   Price2  Price3
0   2018-10-15 06:00:00 1.15960 NaN     NaN
1   2018-10-29 03:00:00 1.14330 1.13926 NaN
3   2018-10-29 06:00:00 1.14015 NaN     NaN
4   2018-11-15 05:00:00 1.14130 1.14000 1.1403

Advertisement

Answer

The trick is to add an incremental counter to each unique datetime. Such that if a datetime is encountered more than once, this counter increases.

To do this, we groupby tradedate, and get a cumulative count of the number of duplicate tradedates there are for a given tradedate. I then add 1 to this value so our counting starts at 1 instea of 0.

df["TradeDate_count"] = df.groupby("TradeDate").cumcount() + 1

print(df)
            TradeDate    Price  TradeDate_count
0 2018-10-15 06:00:00  1.15960                1
1 2018-10-29 03:00:00  1.14330                1
2 2018-10-29 03:00:00  1.13926                2
3 2018-10-29 06:00:00  1.14015                1
4 2018-11-15 05:00:00  1.14130                1
5 2018-11-15 05:00:00  1.14000                2
6 2018-11-15 05:00:00  1.14030                3

Now that we’ve added that column, we can simply pivot to achieve your desired result. Note that I added a rename(...) method simply to add “price” to our column names. I also used the rename_axis method since our pivot returned us a named index for the columns which some users find hard to look at, so I figured it would be best to remove it.

new_df = (df.pivot(index="TradeDate", columns="TradeDate_count", values="Price")
           .rename(columns="price{}".format)
           .rename_axis(columns=None))

                      price1   price2  price3
TradeDate                                    
2018-10-15 06:00:00  1.15960      NaN     NaN
2018-10-29 03:00:00  1.14330  1.13926     NaN
2018-10-29 06:00:00  1.14015      NaN     NaN
2018-11-15 05:00:00  1.14130  1.14000  1.1403
Advertisement