I have a dataframe called resulttable that looks like:
df Index Tag Exp. m/z Intensity Norm_Intensity 114 C09.xls 1013.328 10000 0.125 114 C10.xls 1013.325 20000 0.115 114 C11.xls 1013.329 30000 0.128 180 C09.xls 1257.52 11000 0.215 180 C10.xls 1257.59 12000 0.220 196 C09.xls 1647.593 13000 0.432 196 C10.xls 1647.528 14000 0.450 196 C11.xls 1647.601 31000 0.585
where df Index values are the index values when resulttable is printed or exported to xls, Tag = str, and Exp. m/z, Intensity, and Norm_Intensity are float64. The tag values will be coming from the file names in a specified folder, so they can vary.
As you can see, each tag contains similar Exp. m/z values (like 1013.328, 1013.325, 1013.329) with their corresponding Intensity & Norm_Intensity values. These m/z values are to be considered the same, and I was wondering if there is a way to do calculations such as mean, std, CV on the row values based on their df index values. It doesn’t matter if another dataframe needs to be made for the calculations.
resulttable_calc = resulttable.groupby(resulttable.index) group_result = resulttable_calc.agg({'Exp. m/z':'mean','Intensity':'mean', 'Norm_Intensity':'mean'})
doesn’t seem to do what I wanted. Doing so results in:
df Index Exp. m/z Intensity Norm_Intensity (114, 'C09.xls') 1013.328 10000 0.125 (114, 'C10.xls') 1013.325 20000 0.115 (114, 'C11.xls') 1013.329 30000 0.128 (180, 'C09.xls') 1257.52 11000 0.215 (180, 'C10.xls') 1257.59 12000 0.220 (196, 'C09.xls') 1647.593 13000 0.432 (196, 'C10.xls') 1647.528 14000 0.450 (196, 'C11.xls') 1647.601 31000 0.585
EDIT The output of print(resulttable.index) is:
MultiIndex(levels[[114, 180, 196, ...........]], labels=[[0, 0, 0, 1, 1, 2, 2, 2, ......]], names=[None, 'Tag'])
I assume the labels correspond to each tag.
Advertisement
Answer
Your index is actually a MultiIndex
. Since you’re passing this to groupby()
you’re going to produce a grouping on every combination of the MultiIndex
. I’m assuming you’re looking to group by df Index
– if so you’ll need to group on level=0
.
Try this:
resulttable_calc = resulttable.groupby(level=0) group_result = resulttable_calc.agg({'Exp. m/z':'mean','Intensity':'mean', 'Norm_Intensity':'mean'})