I have a complex mapping problem.
I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products.
MY GOAL : I have to map 5 products in the list of 7 products
- If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3.
- If the color Yellow is not in the Top 3 of the rank, I have to give it a random mapping value, which can be either PRODUCT_4 or PRODUCT_5
- Then I have to map 1 last product (I pick the last product randomly) with last available mapping value, which can be product_4 or product_5 depending on whether I have mapped YELLOW or not.
- The last two products remain with an empty mapping value
ID | PRODUCT | RANK |
---|---|---|
A | RED | 1 |
A | BLUE | 2 |
A | BLACK | 3 |
A | GREEN | 4 |
A | GREEN | 5 |
A | PURPLE | 6 |
A | YELLOW | 7 |
B | PURPLE | 1 |
B | BLACK | 2 |
B | BLACK | 3 |
B | GREEN | 4 |
B | WHITE | 5 |
B | BROWN | 6 |
B | GREEN | 7 |
Expected :
ID | PRODUCT | RANK | MAP |
---|---|---|---|
A | RED | 1 | PRODUCT_1 |
A | BLUE | 2 | PRODUCT_2 |
A | BLACK | 3 | PRODUCT_3 |
A | GREEN | 4 | PRODUCT_5 |
A | GREEN | 5 | “” |
A | PURPLE | 6 | “” |
A | YELLOW | 7 | PRODUCT_4 |
B | PURPLE | 1 | PRODUCT_1 |
B | BLACK | 2 | PRODUCT_2 |
B | YELLOW | 3 | PRODUCT_3 |
B | GREEN | 4 | “” |
B | WHITE | 5 | PRODUCT_4 |
B | BROWN | 6 | PRODUCT_5 |
B | GREEN | 7 | “” |
Advertisement
Answer
With the dataframe you provided:
import pandas as pd df = pd.DataFrame( { "ID": ["A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"], "PRODUCT": [ "RED", "BLUE", "BLACK", "GREEN", "GREEN", "PURPLE", "YELLOW", "PURPLE", "BLACK", "BLACK", "GREEN", "WHITE", "BROWN", "GREEN", ], "RANK": [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7], } )
Here is one way to do it using shuffle and choice from Python standard library random module:
import random dfs = [] for col in df["ID"].unique(): # Setup available_ranks = [4, 5] random.shuffle(available_ranks) temp = df.copy().loc[df["ID"] == col, :] # First rule temp["MAP"] = temp.apply( lambda x: f"PRODUCT_{x['RANK']}" if x["RANK"] <= 3 else "", axis=1 ) # Second rule temp["MAP"] = temp.apply( lambda x: f"PRODUCT_{available_ranks.pop()}" if x["PRODUCT"] == "YELLOW" and x["RANK"] > 3 else x["MAP"], axis=1, ) # Third rule temp.loc[ (temp["MAP"] == "") & (temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)), "MAP", ] = f"PRODUCT_{available_ranks.pop()}" if available_ranks: temp.loc[ (temp["MAP"] == "") & ( temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values) ), "MAP", ] = f"PRODUCT_{available_ranks.pop()}" dfs.append(temp) new_df = pd.concat(dfs)
And so:
print(new_df) # Output ID PRODUCT RANK MAP 0 A RED 1 PRODUCT_1 1 A BLUE 2 PRODUCT_2 2 A BLACK 3 PRODUCT_3 3 A GREEN 4 4 A GREEN 5 PRODUCT_5 5 A PURPLE 6 6 A YELLOW 7 PRODUCT_4 7 B PURPLE 1 PRODUCT_1 8 B BLACK 2 PRODUCT_2 9 B BLACK 3 PRODUCT_3 10 B GREEN 4 11 B WHITE 5 PRODUCT_4 12 B BROWN 6 PRODUCT_5 13 B GREEN 7