Skip to content
Advertisement

How to set values in a dataframe according to multiple conditions and different groups?

I have a complex mapping problem.

I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products.

MY GOAL : I have to map 5 products in the list of 7 products

  1. If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3.
  2. If the color Yellow is not in the Top 3 of the rank, I have to give it a random mapping value, which can be either PRODUCT_4 or PRODUCT_5
  3. Then I have to map 1 last product (I pick the last product randomly) with last available mapping value, which can be product_4 or product_5 depending on whether I have mapped YELLOW or not.
  4. The last two products remain with an empty mapping value
ID PRODUCT RANK
A RED 1
A BLUE 2
A BLACK 3
A GREEN 4
A GREEN 5
A PURPLE 6
A YELLOW 7
B PURPLE 1
B BLACK 2
B BLACK 3
B GREEN 4
B WHITE 5
B BROWN 6
B GREEN 7

Expected :

ID PRODUCT RANK MAP
A RED 1 PRODUCT_1
A BLUE 2 PRODUCT_2
A BLACK 3 PRODUCT_3
A GREEN 4 PRODUCT_5
A GREEN 5 “”
A PURPLE 6 “”
A YELLOW 7 PRODUCT_4
B PURPLE 1 PRODUCT_1
B BLACK 2 PRODUCT_2
B YELLOW 3 PRODUCT_3
B GREEN 4 “”
B WHITE 5 PRODUCT_4
B BROWN 6 PRODUCT_5
B GREEN 7 “”

Advertisement

Answer

With the dataframe you provided:

import pandas as pd

df = pd.DataFrame(
    {
        "ID": ["A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"],
        "PRODUCT": [
            "RED",
            "BLUE",
            "BLACK",
            "GREEN",
            "GREEN",
            "PURPLE",
            "YELLOW",
            "PURPLE",
            "BLACK",
            "BLACK",
            "GREEN",
            "WHITE",
            "BROWN",
            "GREEN",
        ],
        "RANK": [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
    }
)

Here is one way to do it using shuffle and choice from Python standard library random module:

import random

dfs = []

for col in df["ID"].unique():
    # Setup
    available_ranks = [4, 5]
    random.shuffle(available_ranks)
    temp = df.copy().loc[df["ID"] == col, :]

    # First rule
    temp["MAP"] = temp.apply(
        lambda x: f"PRODUCT_{x['RANK']}" if x["RANK"] <= 3 else "", axis=1
    )

    # Second rule
    temp["MAP"] = temp.apply(
        lambda x: f"PRODUCT_{available_ranks.pop()}"
        if x["PRODUCT"] == "YELLOW" and x["RANK"] > 3
        else x["MAP"],
        axis=1,
    )

    # Third rule
    temp.loc[
        (temp["MAP"] == "")
        & (temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)),
        "MAP",
    ] = f"PRODUCT_{available_ranks.pop()}"
    if available_ranks:
        temp.loc[
            (temp["MAP"] == "")
            & (
                temp["RANK"]
                == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)
            ),
            "MAP",
        ] = f"PRODUCT_{available_ranks.pop()}"

    dfs.append(temp)

new_df = pd.concat(dfs)

And so:

print(new_df)
# Output
   ID PRODUCT  RANK        MAP
0   A     RED     1  PRODUCT_1
1   A    BLUE     2  PRODUCT_2
2   A   BLACK     3  PRODUCT_3
3   A   GREEN     4
4   A   GREEN     5  PRODUCT_5
5   A  PURPLE     6
6   A  YELLOW     7  PRODUCT_4
7   B  PURPLE     1  PRODUCT_1
8   B   BLACK     2  PRODUCT_2
9   B   BLACK     3  PRODUCT_3
10  B   GREEN     4
11  B   WHITE     5  PRODUCT_4
12  B   BROWN     6  PRODUCT_5
13  B   GREEN     7
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement