I have a complex mapping problem.
I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products.
MY GOAL : I have to map 5 products in the list of 7 products
- If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3.
- If the color Yellow is not in the Top 3 of the rank, I have to give it a random mapping value, which can be either PRODUCT_4 or PRODUCT_5
- Then I have to map 1 last product (I pick the last product randomly) with last available mapping value, which can be product_4 or product_5 depending on whether I have mapped YELLOW or not.
- The last two products remain with an empty mapping value
| ID | PRODUCT | RANK | 
|---|---|---|
| A | RED | 1 | 
| A | BLUE | 2 | 
| A | BLACK | 3 | 
| A | GREEN | 4 | 
| A | GREEN | 5 | 
| A | PURPLE | 6 | 
| A | YELLOW | 7 | 
| B | PURPLE | 1 | 
| B | BLACK | 2 | 
| B | BLACK | 3 | 
| B | GREEN | 4 | 
| B | WHITE | 5 | 
| B | BROWN | 6 | 
| B | GREEN | 7 | 
Expected :
| ID | PRODUCT | RANK | MAP | 
|---|---|---|---|
| A | RED | 1 | PRODUCT_1 | 
| A | BLUE | 2 | PRODUCT_2 | 
| A | BLACK | 3 | PRODUCT_3 | 
| A | GREEN | 4 | PRODUCT_5 | 
| A | GREEN | 5 | “” | 
| A | PURPLE | 6 | “” | 
| A | YELLOW | 7 | PRODUCT_4 | 
| B | PURPLE | 1 | PRODUCT_1 | 
| B | BLACK | 2 | PRODUCT_2 | 
| B | YELLOW | 3 | PRODUCT_3 | 
| B | GREEN | 4 | “” | 
| B | WHITE | 5 | PRODUCT_4 | 
| B | BROWN | 6 | PRODUCT_5 | 
| B | GREEN | 7 | “” | 
Advertisement
Answer
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
    {
        "ID": ["A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"],
        "PRODUCT": [
            "RED",
            "BLUE",
            "BLACK",
            "GREEN",
            "GREEN",
            "PURPLE",
            "YELLOW",
            "PURPLE",
            "BLACK",
            "BLACK",
            "GREEN",
            "WHITE",
            "BROWN",
            "GREEN",
        ],
        "RANK": [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
    }
)
Here is one way to do it using shuffle and choice from Python standard library random module:
import random
dfs = []
for col in df["ID"].unique():
    # Setup
    available_ranks = [4, 5]
    random.shuffle(available_ranks)
    temp = df.copy().loc[df["ID"] == col, :]
    # First rule
    temp["MAP"] = temp.apply(
        lambda x: f"PRODUCT_{x['RANK']}" if x["RANK"] <= 3 else "", axis=1
    )
    # Second rule
    temp["MAP"] = temp.apply(
        lambda x: f"PRODUCT_{available_ranks.pop()}"
        if x["PRODUCT"] == "YELLOW" and x["RANK"] > 3
        else x["MAP"],
        axis=1,
    )
    # Third rule
    temp.loc[
        (temp["MAP"] == "")
        & (temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)),
        "MAP",
    ] = f"PRODUCT_{available_ranks.pop()}"
    if available_ranks:
        temp.loc[
            (temp["MAP"] == "")
            & (
                temp["RANK"]
                == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)
            ),
            "MAP",
        ] = f"PRODUCT_{available_ranks.pop()}"
    dfs.append(temp)
new_df = pd.concat(dfs)
And so:
print(new_df) # Output ID PRODUCT RANK MAP 0 A RED 1 PRODUCT_1 1 A BLUE 2 PRODUCT_2 2 A BLACK 3 PRODUCT_3 3 A GREEN 4 4 A GREEN 5 PRODUCT_5 5 A PURPLE 6 6 A YELLOW 7 PRODUCT_4 7 B PURPLE 1 PRODUCT_1 8 B BLACK 2 PRODUCT_2 9 B BLACK 3 PRODUCT_3 10 B GREEN 4 11 B WHITE 5 PRODUCT_4 12 B BROWN 6 PRODUCT_5 13 B GREEN 7
