I am trying to find a way to take information from one column in a pandas DataFrame and have its unique value be the new column and its score be the value in the newly formed column. I.e.
| Index | Product | Test | Score |
|---|---|---|---|
| 0 | A | Protection | 5 |
| 1 | A | Comfort | 6 |
| 2 | B | Protection | 6 |
| 3 | B | Comfort | 7 |
And the end result be something like this:
| Index | Product | Protection | Comfort | Test_C | Test_D |
|---|---|---|---|---|---|
| 0 | A | 5 | 6 | 2 | 1 |
| 1 | B | 6 | 7 | 3 | 8 |
I am trying to do this to clean my data ready for machine learning. Test_C and Test_D were added to show that there are more than just 2 types of tests and it differs depending on the product what test is carried out.
I have tried to do it using the Pandas.get_dummies method but was wondering if there was a cleaner way to do this.
Advertisement
Answer
Use pivot():
df.pivot(index = 'Product', columns = 'Test', values = 'Score')
Returns:
Product Comfort Protection A 6 5 B 7 6
If you want to have numerical index or keep ‘Product’ as a column instead of index, add reset_index():
df.pivot(index = 'Product', columns = 'Test', values = 'Score').reset_index()
Returns:
Product Comfort Protection 0 A 6 5 1 B 7 6