Skip to content
Advertisement

creating random values but within another column restraints [python]

I have this type of dataset:

ID      Payment     Product
1        100          A
1        200          B
2        20           C
3        105          D

I want to have:

ID      Payment     Product     Gender
1        100          A           M
1        200          B           M
2        20           C           F
3        105          D           M

As you can see, if I just went and created random values for the Gender column, I will eventually have a problem: I might assign different gender names to the same person ID. If I had unique IDs, then, that wouldn’t have been a problem. But I want to create random value for gender, but within the constraint that they are assigned the same for the same ID. How to accomplish that in Python?

Advertisement

Answer

using random.choice and .replace:

# dummy data
df = pd.DataFrame()
df['ID'] = np.random.randint(0,10, 100)

#create dict that maps id to random gender
genders = {i: np.random.choice(['F', 'M']) for i in df['ID'].unique()}
df['gender'] = df['ID'].replace(genders)
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement