Below I have dataframe (df) of ten rows, each row has a NAME and belongs to a GROUP. Each row has a value for SHARE that is 0.1. I want to manipulate the distribution of shares. For example, if I increase share value for NAME=’ONE’ from 0.1 to 0.175 I want a function that simultaneously decrease share value for the other observations. Also, I want to take into account GROUP – I want a larger share to come from observations in group D than from other groups.
This I try to achieve in the function “redistr_by_group” below.
I doubt my solution is optimal. For example, the nested for loops made me a little bit confused during development – perhaps there is a another method of doing this than using for loops?
Any comments/reflections/criticism or examples on how things can be improved and more pythonic would be much appreciated.
import pandas as pd # Create dataframe df = pd.DataFrame( data={ "GROUP": ["A", "A", "A", "B", "B", "B", "C", "C", "D", "D"], "NAME": [ "ONE", "TWO", "THREE", "FOUR", "FIVE", "SIX", "SEVEN", "EIGHT", "NINE", "TEN", ], "SHARE": [0.10] * 10, } ) # Four list to put into the function name_to_change = ["ONE"] # List of NAME share_change = [0.075] # List of how much share value for 'ONE' should change. groups = ["A", "B", "C", "D"] # Groups to include in redistribution weights = [0.15, 0.2, 0.15, 0.50] # Weights for each group def redistr_by_group(df, name, share, group, weight): list_ = list(df.NAME) # Create list of all the names in df. df = df.set_index("NAME") for i, z in zip(name, share): # loop over the lists name and share df.loc[i, "SHARE"] = df.loc[i, "SHARE"] + z # Assign new share value print(df) list_.remove( i ) # Remove NAME that got it's share value amended from the list list_ group_len = {} # Create 3 dictonarys group_weight = {} group_share = {} for s, w in zip(group, weight): group_weight[s] = w # Assign weight to each group df_ = df.reset_index() for r in group: df_ = df_[df_["NAME"] != i] group_len[r] = ( df_["GROUP"] == r ).sum() # Specify how many observations in each group for h in group: group_share[h] = (z * group_weight[h]) / group_len[ h ] # Specify the share value for each group # Lop the dictonary group_share for di in group_share.items(): tmp = df[df["GROUP"] == di[0]] for kulu in list_: try: df.loc[kulu, "SHARE"] = ( tmp.loc[kulu, "SHARE"] - di[1] ) # Assign amended share value to observations in each group except: pass print(df.SHARE.sum()) return df
Advertisement
Answer
Here is what your code outputs:
GROUP SHARE NAME ONE A 0.175000 TWO A 0.094375 THREE A 0.094375 FOUR B 0.095000 FIVE B 0.095000 SIX B 0.095000 SEVEN C 0.094375 EIGHT C 0.094375 NINE D 0.081250 TEN D 0.081250
I suggest a more idiomatic way to get to the same result:
# Redefined variables name_to_change = "ONE" share_change = 0.075 groups = ["A", "B", "C", "D"] weights = {"A": 0.15, "B": 0.2, "C": 0.15, "D": 0.50} def redistr_by_group(df, name_to_change, share_change, groups, weights): """Refactored function. """ df.loc[df["NAME"] == name_to_change, "SHARE"] += share_change mask = df["NAME"] != name_to_change df.loc[mask, "COEFF"] = df.loc[mask, "GROUP"].apply( lambda x: df[mask].groupby("GROUP").count()["NAME"].to_dict()[x] ) df.loc[mask, "WEIGHT_TEMP"] = ( df.loc[mask, "GROUP"].apply(lambda x: weights[x]) / df.loc[mask, "COEFF"] ) df.loc[mask, "SHARE"] = ( df.loc[mask, "SHARE"] - df.loc[mask, "WEIGHT_TEMP"] * share_change ) return df.drop(columns=["COEFF", "WEIGHT_TEMP"]).reindex( columns=["NAME", "GROUP", "SHARE"] )
df = redistr_by_group(df, name_to_change, share_change, groups, weights) print(df) # Output NAME GROUP SHARE 0 ONE A 0.175000 1 TWO A 0.094375 2 THREE A 0.094375 3 FOUR B 0.095000 4 FIVE B 0.095000 5 SIX B 0.095000 6 SEVEN C 0.094375 7 EIGHT C 0.094375 8 NINE D 0.081250 9 TEN D 0.081250 print(df["SHARE"].sum()) # 1