I have a large data set of machine speeds per simulation. I have a column or which simulation it is as well as columns for the different machines. Now I would like to have different lists of the sum of each machine per simulation so that I can do analysis on this as well as create plots for it.
Here is a MRS:
df = {'sim': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3}, 'mach_1': {0: 1, 1: 5, 2: 4, 3: 2, 4: 6, 5: 8, 6: 2, 7: 5, 8: 8}, 'mach_2': {0: 4, 1: 5, 2: 2, 3: 3, 4: 6, 5: 5, 6: 4, 7: 5, 8: 2}, 'mach_3': {0: 7, 1: 8, 2: 9, 3: 4, 4: 4, 5: 6, 6: 8, 7: 5, 8: 6}}
I know that using works, but this just does it for one machine. Meanwhile I have about 20 and would like to know if there is an easier way (if lists are not the best solution I am also open to all suggestions):
lst_m1 = [] for i in range(4): lst_m1.append(df['mach_1'][df['sim'] == i].sum())
Advertisement
Answer
You can filter rows by list, here range
and then aggregate sum
:
df1 = df[df['sim'].isin(range(4))].groupby('sim', as_index=False)['mach_1'].sum() print (df1) sim mach_1 0 1 10 1 2 16 2 3 15