Skip to content
Advertisement

Improve function that redistribute shares in a pandas dataframe column (possible to avoid nested for loops?)

Below I have dataframe (df) of ten rows, each row has a NAME and belongs to a GROUP. Each row has a value for SHARE that is 0.1. I want to manipulate the distribution of shares. For example, if I increase share value for NAME=’ONE’ from 0.1 to 0.175 I want a function that simultaneously decrease share value for the other observations. Also, I want to take into account GROUP – I want a larger share to come from observations in group D than from other groups.

This I try to achieve in the function “redistr_by_group” below.

I doubt my solution is optimal. For example, the nested for loops made me a little bit confused during development – perhaps there is a another method of doing this than using for loops?

Any comments/reflections/criticism or examples on how things can be improved and more pythonic would be much appreciated.

import pandas as pd

# Create dataframe

df = pd.DataFrame(
    data={
        "GROUP": ["A", "A", "A", "B", "B", "B", "C", "C", "D", "D"],
        "NAME": [
            "ONE",
            "TWO",
            "THREE",
            "FOUR",
            "FIVE",
            "SIX",
            "SEVEN",
            "EIGHT",
            "NINE",
            "TEN",
        ],
        "SHARE": [0.10] * 10,
    }
)

# Four list to put into the function

name_to_change = ["ONE"]  # List of NAME
share_change = [0.075]  # List of how much share value for 'ONE' should change.

groups = ["A", "B", "C", "D"]  # Groups to include in redistribution
weights = [0.15, 0.2, 0.15, 0.50]  # Weights for each group


def redistr_by_group(df, name, share, group, weight):

    list_ = list(df.NAME)  # Create list of all the names in df.

    df = df.set_index("NAME")

    for i, z in zip(name, share):  # loop over the lists name and share

        df.loc[i, "SHARE"] = df.loc[i, "SHARE"] + z  # Assign new share value
        print(df)
        list_.remove(
            i
        )  # Remove NAME that got it's share value amended from the list list_

        group_len = {}  # Create 3 dictonarys
        group_weight = {}
        group_share = {}

        for s, w in zip(group, weight):
            group_weight[s] = w  # Assign weight to each group

        df_ = df.reset_index()

        for r in group:
            df_ = df_[df_["NAME"] != i]
            group_len[r] = (
                df_["GROUP"] == r
            ).sum()  # Specify how many observations in each group

        for h in group:
            group_share[h] = (z * group_weight[h]) / group_len[
                h
            ]  # Specify the share value for each group

        # Lop the dictonary group_share
        for di in group_share.items():

            tmp = df[df["GROUP"] == di[0]]

            for kulu in list_:
                try:
                    df.loc[kulu, "SHARE"] = (
                        tmp.loc[kulu, "SHARE"] - di[1]
                    )  # Assign amended share value to observations in each group
                except:
                    pass

    print(df.SHARE.sum())
    return df

Advertisement

Answer

Here is what your code outputs:

      GROUP     SHARE
NAME
ONE       A  0.175000
TWO       A  0.094375
THREE     A  0.094375
FOUR      B  0.095000
FIVE      B  0.095000
SIX       B  0.095000
SEVEN     C  0.094375
EIGHT     C  0.094375
NINE      D  0.081250
TEN       D  0.081250

I suggest a more idiomatic way to get to the same result:

# Redefined variables
name_to_change = "ONE"
share_change = 0.075
groups = ["A", "B", "C", "D"]
weights = {"A": 0.15, "B": 0.2, "C": 0.15, "D": 0.50}


def redistr_by_group(df, name_to_change, share_change, groups, weights):
    """Refactored function.
    """
    df.loc[df["NAME"] == name_to_change, "SHARE"] += share_change

    mask = df["NAME"] != name_to_change
    df.loc[mask, "COEFF"] = df.loc[mask, "GROUP"].apply(
        lambda x: df[mask].groupby("GROUP").count()["NAME"].to_dict()[x]
    )
    df.loc[mask, "WEIGHT_TEMP"] = (
        df.loc[mask, "GROUP"].apply(lambda x: weights[x]) / df.loc[mask, "COEFF"]
    )
    df.loc[mask, "SHARE"] = (
        df.loc[mask, "SHARE"] - df.loc[mask, "WEIGHT_TEMP"] * share_change
    )

    return df.drop(columns=["COEFF", "WEIGHT_TEMP"]).reindex(
        columns=["NAME", "GROUP", "SHARE"]
    )
df = redistr_by_group(df, name_to_change, share_change, groups, weights)

print(df)
# Output
    NAME GROUP     SHARE
0    ONE     A  0.175000
1    TWO     A  0.094375
2  THREE     A  0.094375
3   FOUR     B  0.095000
4   FIVE     B  0.095000
5    SIX     B  0.095000
6  SEVEN     C  0.094375
7  EIGHT     C  0.094375
8   NINE     D  0.081250
9    TEN     D  0.081250

print(df["SHARE"].sum())  # 1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement