Say that I have a dataframe that looks like:
Name Group_Id AAA 1 ABC 1 CCC 2 XYZ 2 DEF 3 YYH 3
How could I randomly select one (or more) row for each Group_Id? Say that I want one random draw per Group_Id, I would get:
Name Group_Id AAA 1 XYZ 2 DEF 3
Advertisement
Answer
size = 2        # sample size
replace = True  # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df.groupby('Group_Id', as_index=False).apply(fn)