Skip to content
Advertisement

How to add randomly elements to a column of dataframe (Equally distributed to groups)

Suppose I have the following dataframe:

JavaScript

I want to groupby the dataset based on “Type” and then add a new column named as “Sampled” and randomly add yes/no to each row, the yes/no should be distributed equally. The expected dataframe can be:

JavaScript

Advertisement

Answer

You can use numpy.random.choice:

JavaScript

output:

JavaScript

equal probability per group:

JavaScript

For each group, get an arbitrary column (here Type, but it doesn’t matter, this is just to have a shape of 1), and apply np.random.choice with the length of the group as parameter. This gives as many yes or no as the number of items in the group with an equal probability (note that you can define a specific probability per item if you want).

NB. equal probability does not mean you will get necessarily 50/50 of yes/no, if this is what you want please clarify

half yes/no per group

If you want half each kind (yes/no) (±1 in case of odd size), you can select randomly half of the indices.

JavaScript

NB. in case of odd number, there will be one more of the second item defined in the np.where function, here “no”.

distribute equally many elements:

This will distribute equally, in the limit of multiplicity. This means, for 3 elements and 4 places, there will be two a, one b, one c in random order. If you want the extra item(s) to be chosen randomly, first shuffle the input.

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement