I have a number of values per ID in this format:
JavaScript
x
17
17
1
ID Var1 Var2 Var3
2
1 A A C
3
1 C B B
4
1 B D A
5
2 C B C
6
2 D B A
7
2 D A D
8
3 A D B
9
3 B C C
10
3 C B A
11
4 A A D
12
4 C B C
13
4 B B A
14
5 D B B
15
5 A C C
16
5 D C C
17
I want to randomly select IDs but keep all values per ID, so for example, if I wanted to get 2 random IDs; the outcome would look like this:
JavaScript
1
8
1
ID Var1 Var2 Var3
2
2 C B C
3
2 D B A
4
2 D A D
5
5 D B B
6
5 A C C
7
5 D C C
8
Giving me, ID 2 & 5.
Advertisement
Answer
Use numpy.random.choice to select random values then select them.
JavaScript
1
5
1
from numpy.random import choice
2
3
nums = choice(df.ID.unique(), 2) # this line selects 2 unique random values from column ID
4
df_new = df[df['ID'].isin(nums)] # this line selects those 2 random IDs
5
Edit:
please read the comment from ouroboros1. choice has a parameter replace and it should be used.
JavaScript
1
2
1
choice(df.ID.unique(), 2, replace=False)
2