I have a complex mapping problem.
I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products.
MY GOAL : I have to map 5 products in the list of 7 products
- If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3.
- If the color Yellow is not in the Top 3 of the rank, I have to give it a random mapping value, which can be either PRODUCT_4 or PRODUCT_5
- Then I have to map 1 last product (I pick the last product randomly) with last available mapping value, which can be product_4 or product_5 depending on whether I have mapped YELLOW or not.
- The last two products remain with an empty mapping value
| ID | PRODUCT | RANK |
|---|---|---|
| A | RED | 1 |
| A | BLUE | 2 |
| A | BLACK | 3 |
| A | GREEN | 4 |
| A | GREEN | 5 |
| A | PURPLE | 6 |
| A | YELLOW | 7 |
| B | PURPLE | 1 |
| B | BLACK | 2 |
| B | BLACK | 3 |
| B | GREEN | 4 |
| B | WHITE | 5 |
| B | BROWN | 6 |
| B | GREEN | 7 |
Expected :
| ID | PRODUCT | RANK | MAP |
|---|---|---|---|
| A | RED | 1 | PRODUCT_1 |
| A | BLUE | 2 | PRODUCT_2 |
| A | BLACK | 3 | PRODUCT_3 |
| A | GREEN | 4 | PRODUCT_5 |
| A | GREEN | 5 | “” |
| A | PURPLE | 6 | “” |
| A | YELLOW | 7 | PRODUCT_4 |
| B | PURPLE | 1 | PRODUCT_1 |
| B | BLACK | 2 | PRODUCT_2 |
| B | YELLOW | 3 | PRODUCT_3 |
| B | GREEN | 4 | “” |
| B | WHITE | 5 | PRODUCT_4 |
| B | BROWN | 6 | PRODUCT_5 |
| B | GREEN | 7 | “” |
Advertisement
Answer
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"ID": ["A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"],
"PRODUCT": [
"RED",
"BLUE",
"BLACK",
"GREEN",
"GREEN",
"PURPLE",
"YELLOW",
"PURPLE",
"BLACK",
"BLACK",
"GREEN",
"WHITE",
"BROWN",
"GREEN",
],
"RANK": [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
}
)
Here is one way to do it using shuffle and choice from Python standard library random module:
import random
dfs = []
for col in df["ID"].unique():
# Setup
available_ranks = [4, 5]
random.shuffle(available_ranks)
temp = df.copy().loc[df["ID"] == col, :]
# First rule
temp["MAP"] = temp.apply(
lambda x: f"PRODUCT_{x['RANK']}" if x["RANK"] <= 3 else "", axis=1
)
# Second rule
temp["MAP"] = temp.apply(
lambda x: f"PRODUCT_{available_ranks.pop()}"
if x["PRODUCT"] == "YELLOW" and x["RANK"] > 3
else x["MAP"],
axis=1,
)
# Third rule
temp.loc[
(temp["MAP"] == "")
& (temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)),
"MAP",
] = f"PRODUCT_{available_ranks.pop()}"
if available_ranks:
temp.loc[
(temp["MAP"] == "")
& (
temp["RANK"]
== random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)
),
"MAP",
] = f"PRODUCT_{available_ranks.pop()}"
dfs.append(temp)
new_df = pd.concat(dfs)
And so:
print(new_df) # Output ID PRODUCT RANK MAP 0 A RED 1 PRODUCT_1 1 A BLUE 2 PRODUCT_2 2 A BLACK 3 PRODUCT_3 3 A GREEN 4 4 A GREEN 5 PRODUCT_5 5 A PURPLE 6 6 A YELLOW 7 PRODUCT_4 7 B PURPLE 1 PRODUCT_1 8 B BLACK 2 PRODUCT_2 9 B BLACK 3 PRODUCT_3 10 B GREEN 4 11 B WHITE 5 PRODUCT_4 12 B BROWN 6 PRODUCT_5 13 B GREEN 7