How are you?
I have a database where some lines have more than one product and they are separated by a comma, as in the example below (there are other columns, but to make it more practical I only took these three).
| id | produdct | value |
|---|---|---|
| 47 | product1, product 2 | 12000.0 |
| 48 | product3 | 48000.0 |
| 49 | product4, product1, product2 | 28800.0 |
| 50 | product1 | 2000.0 |
| 51 | product5, product2 | 32000.0 |
| 53 | product3 | 128000.0 |
| 54 | product2 | 15000.0 |
| 55 | product4, product2, product5 | 96000.0 |
I need to separate each product, making a copy of that line for each one. I tried using some functions like explode, json_normalize, I tried creating a list of lists but nothing worked. Can you help me?
Advertisement
Answer
Just use str.split and explode
df['produdct'] = df['produdct'].str.split(', ')
new_df = df.explode('produdct')
id produdct value
0 47 product1 12000.0
0 47 product 2 12000.0
1 48 product3 48000.0
2 49 product4 28800.0
2 49 product1 28800.0
2 49 product2 28800.0
3 50 product1 2000.0
4 51 product5 32000.0
4 51 product2 32000.0
5 53 product3 128000.0
6 54 product2 15000.0
7 55 product4 96000.0
7 55 product2 96000.0
7 55 product5 96000.0