Skip to content
Advertisement

python pandas how to compute on rows with same index values

I have a dataframe called resulttable that looks like:

df Index        Tag        Exp. m/z    Intensity    Norm_Intensity
114             C09.xls    1013.328    10000        0.125
114             C10.xls    1013.325    20000        0.115
114             C11.xls    1013.329    30000        0.128
180             C09.xls    1257.52     11000        0.215
180             C10.xls    1257.59     12000        0.220
196             C09.xls    1647.593    13000        0.432
196             C10.xls    1647.528    14000        0.450
196             C11.xls    1647.601    31000        0.585

where df Index values are the index values when resulttable is printed or exported to xls, Tag = str, and Exp. m/z, Intensity, and Norm_Intensity are float64. The tag values will be coming from the file names in a specified folder, so they can vary.

As you can see, each tag contains similar Exp. m/z values (like 1013.328, 1013.325, 1013.329) with their corresponding Intensity & Norm_Intensity values. These m/z values are to be considered the same, and I was wondering if there is a way to do calculations such as mean, std, CV on the row values based on their df index values. It doesn’t matter if another dataframe needs to be made for the calculations.

resulttable_calc = resulttable.groupby(resulttable.index)
group_result = resulttable_calc.agg({'Exp. m/z':'mean','Intensity':'mean', 'Norm_Intensity':'mean'})

doesn’t seem to do what I wanted. Doing so results in:

df Index               Exp. m/z    Intensity    Norm_Intensity
(114, 'C09.xls')       1013.328    10000        0.125
(114, 'C10.xls')       1013.325    20000        0.115
(114, 'C11.xls')       1013.329    30000        0.128
(180, 'C09.xls')       1257.52     11000        0.215
(180, 'C10.xls')       1257.59     12000        0.220
(196, 'C09.xls')       1647.593    13000        0.432
(196, 'C10.xls')       1647.528    14000        0.450
(196, 'C11.xls')       1647.601    31000        0.585

EDIT The output of print(resulttable.index) is:

MultiIndex(levels[[114, 180, 196, ...........]],
           labels=[[0, 0, 0, 1, 1, 2, 2, 2, ......]],
           names=[None, 'Tag'])

I assume the labels correspond to each tag.

Advertisement

Answer

Your index is actually a MultiIndex. Since you’re passing this to groupby() you’re going to produce a grouping on every combination of the MultiIndex. I’m assuming you’re looking to group by df Index– if so you’ll need to group on level=0.

Try this:

resulttable_calc = resulttable.groupby(level=0)
group_result = resulttable_calc.agg({'Exp. m/z':'mean','Intensity':'mean', 'Norm_Intensity':'mean'})
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement