Add Categorical Column with Specific Count

Question

I'm trying to create a new categorical column of countries with specific percentage values. Take the following dataset, for instance: I'm trying the following script to get the new column: However, I'm getting all the countries with equal count. I want specific count for each country: Desired Output What would be the ideal way of getting the desired output? Any

Accepted Answer

Do you want to change the probabilities of numpy.random.choice?df["country"] = np.random.choice(country, len(df), p=[0.91, 0.06, 0.03])df["country"].value_counts(normalize=True)Output:UK         0.902357Ireland    0.058361France     0.039282Name: country, dtype: float64If you want a exact number of values (within the limit of the precision):p = [0.91, 0.06, 0.03]r = (np.array(p)*len(df)).round().astype(int) # the sum MUST be equal to len(df)# or# r = [811,  53,  27]a = np.repeat(country, r)np.random.shuffle(a)df['country'] = adf["country"].value_counts(normalize=True)Output:UK         0.910213Ireland    0.059484France     0.030303Name: country, dtype: float64

Advertisement

Answer