Skip to content
Advertisement

groupby in pandas with custom function over a subset of rows in each group

I have a pandas DataFrame of the following format:

Input:

JavaScript

where (version, branch) is a MultiIndex.

PROBLEM DESCRIPTION:

I want to groupby version and set the values in the column X with branch overall to the sum of the values in the column X for the remaining branches (having the same version), weighted by the values in the column N. For groups (i.e. versions) which have only one branch (named overall), I want X to be set to 1.

EXAMPLE:

For version v2, the value in the cell with column X and branch overall should be

(2341.5 * 1 + 95.0 * 2 + 38.5 * 2) / 2475.0 = 1.05393939394,

and in pseudo-code:

(A_N * A_X + B_N * B_X) / overall_N.

Note: For a given version, the value in column N and branch overall will always be equal to the sum of the values in column N for the other branch‘es.

IDEA AND QUESTION:

I think I have to do the following:

df.loc[pd.IndexSlice[:, 'overall'], 'X'] = df.groupby('version').apply(...)

where df is the DataFrame and where ... is to be replaced by a custom function.

I am looking for help in constructing such a function.

Expected output:

JavaScript

Explaination of expected output:

JavaScript

CODE TO CREATE DATAFRAME:

JavaScript

JavaScript

Advertisement

Answer

Use:

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement