Below I have dataframe (df) of ten rows, each row has a NAME and belongs to a GROUP. Each row has a value for SHARE that is 0.1. I want to manipulate the distribution of shares. For example, if I increase share value for NAME=’ONE’ from 0.1 to 0.175 I want a function that simultaneously decrease share value for the other observations. Also, I want to take into account GROUP – I want a larger share to come from observations in group D than from other groups.
This I try to achieve in the function “redistr_by_group” below.
I doubt my solution is optimal. For example, the nested for loops made me a little bit confused during development – perhaps there is a another method of doing this than using for loops?
Any comments/reflections/criticism or examples on how things can be improved and more pythonic would be much appreciated.
import pandas as pd
# Create dataframe
df = pd.DataFrame(
data={
"GROUP": ["A", "A", "A", "B", "B", "B", "C", "C", "D", "D"],
"NAME": [
"ONE",
"TWO",
"THREE",
"FOUR",
"FIVE",
"SIX",
"SEVEN",
"EIGHT",
"NINE",
"TEN",
],
"SHARE": [0.10] * 10,
}
)
# Four list to put into the function
name_to_change = ["ONE"] # List of NAME
share_change = [0.075] # List of how much share value for 'ONE' should change.
groups = ["A", "B", "C", "D"] # Groups to include in redistribution
weights = [0.15, 0.2, 0.15, 0.50] # Weights for each group
def redistr_by_group(df, name, share, group, weight):
list_ = list(df.NAME) # Create list of all the names in df.
df = df.set_index("NAME")
for i, z in zip(name, share): # loop over the lists name and share
df.loc[i, "SHARE"] = df.loc[i, "SHARE"] + z # Assign new share value
print(df)
list_.remove(
i
) # Remove NAME that got it's share value amended from the list list_
group_len = {} # Create 3 dictonarys
group_weight = {}
group_share = {}
for s, w in zip(group, weight):
group_weight[s] = w # Assign weight to each group
df_ = df.reset_index()
for r in group:
df_ = df_[df_["NAME"] != i]
group_len[r] = (
df_["GROUP"] == r
).sum() # Specify how many observations in each group
for h in group:
group_share[h] = (z * group_weight[h]) / group_len[
h
] # Specify the share value for each group
# Lop the dictonary group_share
for di in group_share.items():
tmp = df[df["GROUP"] == di[0]]
for kulu in list_:
try:
df.loc[kulu, "SHARE"] = (
tmp.loc[kulu, "SHARE"] - di[1]
) # Assign amended share value to observations in each group
except:
pass
print(df.SHARE.sum())
return df
Advertisement
Answer
Here is what your code outputs:
GROUP SHARE NAME ONE A 0.175000 TWO A 0.094375 THREE A 0.094375 FOUR B 0.095000 FIVE B 0.095000 SIX B 0.095000 SEVEN C 0.094375 EIGHT C 0.094375 NINE D 0.081250 TEN D 0.081250
I suggest a more idiomatic way to get to the same result:
# Redefined variables
name_to_change = "ONE"
share_change = 0.075
groups = ["A", "B", "C", "D"]
weights = {"A": 0.15, "B": 0.2, "C": 0.15, "D": 0.50}
def redistr_by_group(df, name_to_change, share_change, groups, weights):
"""Refactored function.
"""
df.loc[df["NAME"] == name_to_change, "SHARE"] += share_change
mask = df["NAME"] != name_to_change
df.loc[mask, "COEFF"] = df.loc[mask, "GROUP"].apply(
lambda x: df[mask].groupby("GROUP").count()["NAME"].to_dict()[x]
)
df.loc[mask, "WEIGHT_TEMP"] = (
df.loc[mask, "GROUP"].apply(lambda x: weights[x]) / df.loc[mask, "COEFF"]
)
df.loc[mask, "SHARE"] = (
df.loc[mask, "SHARE"] - df.loc[mask, "WEIGHT_TEMP"] * share_change
)
return df.drop(columns=["COEFF", "WEIGHT_TEMP"]).reindex(
columns=["NAME", "GROUP", "SHARE"]
)
df = redistr_by_group(df, name_to_change, share_change, groups, weights)
print(df)
# Output
NAME GROUP SHARE
0 ONE A 0.175000
1 TWO A 0.094375
2 THREE A 0.094375
3 FOUR B 0.095000
4 FIVE B 0.095000
5 SIX B 0.095000
6 SEVEN C 0.094375
7 EIGHT C 0.094375
8 NINE D 0.081250
9 TEN D 0.081250
print(df["SHARE"].sum()) # 1