I have a number of values per ID in this format:
ID Var1 Var2 Var3 1 A A C 1 C B B 1 B D A 2 C B C 2 D B A 2 D A D 3 A D B 3 B C C 3 C B A 4 A A D 4 C B C 4 B B A 5 D B B 5 A C C 5 D C C
I want to randomly select IDs but keep all values per ID, so for example, if I wanted to get 2 random IDs; the outcome would look like this:
ID Var1 Var2 Var3 2 C B C 2 D B A 2 D A D 5 D B B 5 A C C 5 D C C
Giving me, ID 2 & 5.
Advertisement
Answer
Use numpy.random.choice to select random values then select them.
from numpy.random import choice nums = choice(df.ID.unique(), 2) # this line selects 2 unique random values from column ID df_new = df[df['ID'].isin(nums)] # this line selects those 2 random IDs
Edit:
please read the comment from ouroboros1. choice has a parameter replace and it should be used.
choice(df.ID.unique(), 2, replace=False)