Looping through a second column using a probability input

Question

I have a similar question to one I posed here, but subtly different as it includes an extra step to the process involving a probability: Using a Python pandas dataframe column as input to a loop through another column I've got two pandas dataframes: one has these variables Another is a table with these variables: Essentially I need to use

Accepted Answer

We&#8217;ll use numpy binomial and pandas sample to get this done.import pandas as pdimport numpy as np# Set up dataframesvals = pd.DataFrame([[1,8,'25%'], [2,26,'19%'], [3,17,'26%'],[4,9,'10%']])vals.columns = ['Year', 'Count', 'Probability']temp = pd.DataFrame([[1,100], [2,25], [3,50], [4,15], [5,75]])temp.columns = ['ID', 'Value']# Get probability fraction from stringvals['Numeric_Probability'] = pd.to_numeric(vals['Probability'].str.replace('%', '')) / 100# Total rows is binomial random variable with n=Count, p=Probability.vals['Total_Rows'] = np.random.binomial(n=vals['Count'], p=vals['Numeric_Probability'])# Sample "total rows" from other DataFrame and sum.vals['Sum'] = vals['Total_Rows'].apply(lambda x: temp['Value'].sample(    n=x, replace=True).sum())# Drop intermediate rowsvals.drop(columns=['Numeric_Probability', 'Total_Rows'], inplace=True)print(vals)   Year  Count Probability  Sum0     1      8         25%   151     2     26         19%  3502     3     17         26%  1903     4      9         10%    0

Advertisement

Answer