How are you?
I have a database where some lines have more than one product and they are separated by a comma, as in the example below (there are other columns, but to make it more practical I only took these three).
| id | produdct | value | 
|---|---|---|
| 47 | product1, product 2 | 12000.0 | 
| 48 | product3 | 48000.0 | 
| 49 | product4, product1, product2 | 28800.0 | 
| 50 | product1 | 2000.0 | 
| 51 | product5, product2 | 32000.0 | 
| 53 | product3 | 128000.0 | 
| 54 | product2 | 15000.0 | 
| 55 | product4, product2, product5 | 96000.0 | 
I need to separate each product, making a copy of that line for each one. I tried using some functions like explode, json_normalize, I tried creating a list of lists but nothing worked. Can you help me?
Advertisement
Answer
Just use str.split and explode
df['produdct'] = df['produdct'].str.split(', ')
new_df = df.explode('produdct')
   id   produdct     value
0  47   product1   12000.0
0  47  product 2   12000.0
1  48   product3   48000.0
2  49   product4   28800.0
2  49   product1   28800.0
2  49   product2   28800.0
3  50   product1    2000.0
4  51   product5   32000.0
4  51   product2   32000.0
5  53   product3  128000.0
6  54   product2   15000.0
7  55   product4   96000.0
7  55   product2   96000.0
7  55   product5   96000.0
