I have a complex mapping problem.
I have a table with groups. Each group has a list of 7 products The product list may contain duplicate products.
MY GOAL : I have to map 5 products in the list of 7 products
- If the value of the column rank is equal to 1, 2, 3 then Product_1, Product_2, Product_3.
- If the color Yellow is not in the Top 3 of the rank, I have to give it a random mapping value, which can be either PRODUCT_4 or PRODUCT_5
- Then I have to map 1 last product (I pick the last product randomly) with last available mapping value, which can be product_4 or product_5 depending on whether I have mapped YELLOW or not.
- The last two products remain with an empty mapping value
ID | PRODUCT | RANK |
---|---|---|
A | RED | 1 |
A | BLUE | 2 |
A | BLACK | 3 |
A | GREEN | 4 |
A | GREEN | 5 |
A | PURPLE | 6 |
A | YELLOW | 7 |
B | PURPLE | 1 |
B | BLACK | 2 |
B | BLACK | 3 |
B | GREEN | 4 |
B | WHITE | 5 |
B | BROWN | 6 |
B | GREEN | 7 |
Expected :
ID | PRODUCT | RANK | MAP |
---|---|---|---|
A | RED | 1 | PRODUCT_1 |
A | BLUE | 2 | PRODUCT_2 |
A | BLACK | 3 | PRODUCT_3 |
A | GREEN | 4 | PRODUCT_5 |
A | GREEN | 5 | “” |
A | PURPLE | 6 | “” |
A | YELLOW | 7 | PRODUCT_4 |
B | PURPLE | 1 | PRODUCT_1 |
B | BLACK | 2 | PRODUCT_2 |
B | YELLOW | 3 | PRODUCT_3 |
B | GREEN | 4 | “” |
B | WHITE | 5 | PRODUCT_4 |
B | BROWN | 6 | PRODUCT_5 |
B | GREEN | 7 | “” |
Advertisement
Answer
With the dataframe you provided:
JavaScript
x
25
25
1
import pandas as pd
2
3
df = pd.DataFrame(
4
{
5
"ID": ["A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"],
6
"PRODUCT": [
7
"RED",
8
"BLUE",
9
"BLACK",
10
"GREEN",
11
"GREEN",
12
"PURPLE",
13
"YELLOW",
14
"PURPLE",
15
"BLACK",
16
"BLACK",
17
"GREEN",
18
"WHITE",
19
"BROWN",
20
"GREEN",
21
],
22
"RANK": [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
23
}
24
)
25
Here is one way to do it using shuffle and choice from Python standard library random module:
JavaScript
1
43
43
1
import random
2
3
dfs = []
4
5
for col in df["ID"].unique():
6
# Setup
7
available_ranks = [4, 5]
8
random.shuffle(available_ranks)
9
temp = df.copy().loc[df["ID"] == col, :]
10
11
# First rule
12
temp["MAP"] = temp.apply(
13
lambda x: f"PRODUCT_{x['RANK']}" if x["RANK"] <= 3 else "", axis=1
14
)
15
16
# Second rule
17
temp["MAP"] = temp.apply(
18
lambda x: f"PRODUCT_{available_ranks.pop()}"
19
if x["PRODUCT"] == "YELLOW" and x["RANK"] > 3
20
else x["MAP"],
21
axis=1,
22
)
23
24
# Third rule
25
temp.loc[
26
(temp["MAP"] == "")
27
& (temp["RANK"] == random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)),
28
"MAP",
29
] = f"PRODUCT_{available_ranks.pop()}"
30
if available_ranks:
31
temp.loc[
32
(temp["MAP"] == "")
33
& (
34
temp["RANK"]
35
== random.choice(temp.loc[temp["MAP"] == "", "RANK"].values)
36
),
37
"MAP",
38
] = f"PRODUCT_{available_ranks.pop()}"
39
40
dfs.append(temp)
41
42
new_df = pd.concat(dfs)
43
And so:
JavaScript
1
18
18
1
print(new_df)
2
# Output
3
ID PRODUCT RANK MAP
4
0 A RED 1 PRODUCT_1
5
1 A BLUE 2 PRODUCT_2
6
2 A BLACK 3 PRODUCT_3
7
3 A GREEN 4
8
4 A GREEN 5 PRODUCT_5
9
5 A PURPLE 6
10
6 A YELLOW 7 PRODUCT_4
11
7 B PURPLE 1 PRODUCT_1
12
8 B BLACK 2 PRODUCT_2
13
9 B BLACK 3 PRODUCT_3
14
10 B GREEN 4
15
11 B WHITE 5 PRODUCT_4
16
12 B BROWN 6 PRODUCT_5
17
13 B GREEN 7
18