I have a data frame like below,
Name = ['A','A','A','A','A','A','B','B','B','B','B','B','B'] Id = ['10','10','10','10','10','10','20','20','20','20','20','20','20'] Depth_Feet = ['69.1','70.5','71.4','72.8','73.2','74.2','208.0','209.2','210.2','211.0','211.2','211.7','212.5'] Val = ['2','3.1','1.1','2.1','6.0','1.1','1.2','1.3','3.1','2.9','5.0','6.1','3.2'] d = {'Name':Name,'Id':Id,'Depth_Feet':Depth_Feet,'Val':Val} df = pd.DataFrame(d) print (df.head(20))
Depth_Feet Id Name Val 0 69.1 10 A 2 1 70.5 10 A 3.1 2 71.4 10 A 1.1 3 72.8 10 A 2.1 4 73.2 10 A 6.0 5 74.2 10 A 1.1 6 208.0 20 B 1.2 7 209.2 20 B 1.3 8 210.2 20 B 3.1 9 211.0 20 B 2.9 10 211.2 20 B 5.0 11 211.7 20 B 6.1 12 212.5 20 B 3.2
I want to reduce the size of data frame by Depth_Feet column (let’s say every 2 feet). Desired output is
Depth_Feet Id Name Val 0 69.1 10 A 2 1 71.4 10 A 1.1 2 73.2 10 A 6.0 3 208.0 20 B 1.2 4 210.2 20 B 3.1 5 212.5 20 B 3.2
I have tried few options like round and group by etc, but I’m not able to get the result I want.
Advertisement
Answer
If need each 2 rows per groups:
df1 = df[df.groupby('Name').cumcount() % 2 == 0] print (df1) Name Id Depth_Feet Val 0 A 10 69.1 2 2 A 10 71.4 1.1 4 A 10 73.2 6.0 6 B 20 208.0 1.2 8 B 20 210.2 3.1 10 B 20 211.2 5.0 12 B 20 212.5 3.2
If need resample by 2 per groups convert values to TimedeltaIndex:
df2 = (df.set_index(pd.to_timedelta(df.Depth_Feet.astype(float), unit='D')) .groupby('Name') .resample('2D') .first() .reset_index(drop=True)) print (df2) Name Id Depth_Feet Val 0 A 10 69.1 2 1 A 10 71.4 1.1 2 A 10 73.2 6.0 3 B 20 208.0 1.2 4 B 20 210.2 3.1 5 B 20 212.5 3.2