I have this type of dataset:
ID Payment Product 1 100 A 1 200 B 2 20 C 3 105 D
I want to have:
ID Payment Product Gender 1 100 A M 1 200 B M 2 20 C F 3 105 D M
As you can see, if I just went and created random values for the Gender column, I will eventually have a problem: I might assign different gender names to the same person ID. If I had unique IDs, then, that wouldn’t have been a problem. But I want to create random value for gender, but within the constraint that they are assigned the same for the same ID. How to accomplish that in Python?
Advertisement
Answer
using random.choice and .replace:
# dummy data df = pd.DataFrame() df['ID'] = np.random.randint(0,10, 100) #create dict that maps id to random gender genders = {i: np.random.choice(['F', 'M']) for i in df['ID'].unique()} df['gender'] = df['ID'].replace(genders)