Skip to content
Advertisement

Is there a way of selecting all records of a certain ID after randomly selecting the IDs?

I have a number of values per ID in this format:

ID  Var1 Var2 Var3
1    A     A    C
1    C     B    B
1    B     D    A
2    C     B    C
2    D     B    A
2    D     A    D
3    A     D    B
3    B     C    C
3    C     B    A
4    A     A    D
4    C     B    C
4    B     B    A
5    D     B    B
5    A     C    C
5    D     C    C

I want to randomly select IDs but keep all values per ID, so for example, if I wanted to get 2 random IDs; the outcome would look like this:

ID  Var1 Var2 Var3
2    C     B    C
2    D     B    A
2    D     A    D
5    D     B    B
5    A     C    C
5    D     C    C

Giving me, ID 2 & 5.

Advertisement

Answer

Use numpy.random.choice to select random values then select them.

from numpy.random import choice

nums = choice(df.ID.unique(), 2) # this line selects 2 unique random values from column ID
df_new = df[df['ID'].isin(nums)] # this line selects those 2 random IDs

Edit:

please read the comment from ouroboros1. choice has a parameter replace and it should be used.

choice(df.ID.unique(), 2, replace=False)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement