I have a data frame like below,
Name = ['A','A','A','A','A','A','B','B','B','B','B','B','B']
Id = ['10','10','10','10','10','10','20','20','20','20','20','20','20']
Depth_Feet = ['69.1','70.5','71.4','72.8','73.2','74.2','208.0','209.2','210.2','211.0','211.2','211.7','212.5']
Val = ['2','3.1','1.1','2.1','6.0','1.1','1.2','1.3','3.1','2.9','5.0','6.1','3.2']
d = {'Name':Name,'Id':Id,'Depth_Feet':Depth_Feet,'Val':Val}
df = pd.DataFrame(d)
print (df.head(20))
Depth_Feet Id Name Val 0 69.1 10 A 2 1 70.5 10 A 3.1 2 71.4 10 A 1.1 3 72.8 10 A 2.1 4 73.2 10 A 6.0 5 74.2 10 A 1.1 6 208.0 20 B 1.2 7 209.2 20 B 1.3 8 210.2 20 B 3.1 9 211.0 20 B 2.9 10 211.2 20 B 5.0 11 211.7 20 B 6.1 12 212.5 20 B 3.2
I want to reduce the size of data frame by Depth_Feet column (let’s say every 2 feet). Desired output is
Depth_Feet Id Name Val 0 69.1 10 A 2 1 71.4 10 A 1.1 2 73.2 10 A 6.0 3 208.0 20 B 1.2 4 210.2 20 B 3.1 5 212.5 20 B 3.2
I have tried few options like round and group by etc, but I’m not able to get the result I want.
Advertisement
Answer
If need each 2 rows per groups:
df1 = df[df.groupby('Name').cumcount() % 2 == 0]
print (df1)
Name Id Depth_Feet Val
0 A 10 69.1 2
2 A 10 71.4 1.1
4 A 10 73.2 6.0
6 B 20 208.0 1.2
8 B 20 210.2 3.1
10 B 20 211.2 5.0
12 B 20 212.5 3.2
If need resample by 2 per groups convert values to TimedeltaIndex:
df2 = (df.set_index(pd.to_timedelta(df.Depth_Feet.astype(float), unit='D'))
.groupby('Name')
.resample('2D')
.first()
.reset_index(drop=True))
print (df2)
Name Id Depth_Feet Val
0 A 10 69.1 2
1 A 10 71.4 1.1
2 A 10 73.2 6.0
3 B 20 208.0 1.2
4 B 20 210.2 3.1
5 B 20 212.5 3.2