I have a Python pandas dataframe that looks like this:
print(dataframe1.head(30)) Date Close* month_initial month day year date_final 2022-09-23 Sep 23, 2022 3693.23 Sep 9.0 23 2022 2022-09-22 Sep 22, 2022 3757.99 Sep 9.0 22 2022 2022-09-21 Sep 21, 2022 3789.93 Sep 9.0 21 2022 2022-09-20 Sep 20, 2022 3855.93 Sep 9.0 20 2022 2022-09-19 Sep 19, 2022 3899.89 Sep 9.0 19 2022 2022-09-16 Sep 16, 2022 3873.33 Sep 9.0 16 2022 2022-09-15 Sep 15, 2022 3901.35 Sep 9.0 15 2022 2022-09-14 Sep 14, 2022 3946.01 Sep 9.0 14 2022 2022-09-13 Sep 13, 2022 3932.69 Sep 9.0 13 2022 2022-09-12 Sep 12, 2022 4110.41 Sep 9.0 12 2022 2022-09-09 Sep 09, 2022 4067.36 Sep 9.0 09 2022 2022-09-08 Sep 08, 2022 4006.18 Sep 9.0 08 2022 2022-09-07 Sep 07, 2022 3979.87 Sep 9.0 07 2022 2022-09-06 Sep 06, 2022 3908.19 Sep 9.0 06 2022 2022-09-02 Sep 02, 2022 3924.26 Sep 9.0 02 2022 2022-09-01 Sep 01, 2022 3966.85 Sep 9.0 01 2022 2022-08-31 Aug 31, 2022 3955.00 Aug 8.0 31 2022 2022-08-30 Aug 30, 2022 3986.16 Aug 8.0 30 2022 2022-08-29 Aug 29, 2022 4030.61 Aug 8.0 29 2022 2022-08-26 Aug 26, 2022 4057.66 Aug 8.0 26 2022 2022-08-25 Aug 25, 2022 4199.12 Aug 8.0 25 2022 2022-08-24 Aug 24, 2022 4140.77 Aug 8.0 24 2022 2022-08-23 Aug 23, 2022 4128.73 Aug 8.0 23 2022 2022-08-22 Aug 22, 2022 4137.99 Aug 8.0 22 2022 2022-08-19 Aug 19, 2022 4228.48 Aug 8.0 19 2022 2022-08-18 Aug 18, 2022 4283.74 Aug 8.0 18 2022 2022-08-17 Aug 17, 2022 4274.04 Aug 8.0 17 2022 2022-08-16 Aug 16, 2022 4305.20 Aug 8.0 16 2022 2022-08-15 Aug 15, 2022 4297.14 Aug 8.0 15 2022 2022-08-12 Aug 12, 2022 4280.15 Aug 8.0 12 2022
I want to keep the first and the last row per month. How can I do that? I tried using the following code:
import pandas as pd dataframe1.set_index("date_final", inplace=True) resultDf = dataframe1.groupby([dataframe1.index.year, dataframe1.index.month]).agg(["first", "last"]) resultDf.index.rename(["year", "month"], inplace=True) resultDf.reset_index(inplace=True) resultDf
but I don’t get the results I want.
Advertisement
Answer
pandas
groupby operations don’t sort each group prior to aggregation, which is why 'first'
and 'last'
are not selecting the correct rows for you.
Additionally, you can use .resample('M')
instead of a groupby on year & month.
out = ( df.set_index(df.index.astype('datetime64[ns]')) # copying in the data, I lost the datetime index .sort_index() # sort ensures first and last work as expected .resample('M') # resample for a shorthand year/month grouping .agg(['first', 'last']) ) print(out) Date Close* month_initial month day year first last first last first last first last first last first last date_final 2022-08-31 Aug 12, 2022 Aug 31, 2022 4280.15 3955.00 Aug Aug 8.0 8.0 12 31 2022 2022 2022-09-30 Sep 01, 2022 Sep 23, 2022 3966.85 3693.23 Sep Sep 9.0 9.0 1 23 2022 2022
This output doesn’t have the most usable format, so we can use a quick .stack
to remedy it:
out = out.stack() print(out) Date Close* month_initial month day year date_final 2022-08-31 first Aug 12, 2022 4280.15 Aug 8.0 12 2022 last Aug 31, 2022 3955.00 Aug 8.0 31 2022 2022-09-30 first Sep 01, 2022 3966.85 Sep 9.0 1 2022 last Sep 23, 2022 3693.23 Sep 9.0 23 2022