I am trying to find a way to take information from one column in a pandas DataFrame and have its unique value be the new column and its score be the value in the newly formed column. I.e.
Index | Product | Test | Score |
---|---|---|---|
0 | A | Protection | 5 |
1 | A | Comfort | 6 |
2 | B | Protection | 6 |
3 | B | Comfort | 7 |
And the end result be something like this:
Index | Product | Protection | Comfort | Test_C | Test_D |
---|---|---|---|---|---|
0 | A | 5 | 6 | 2 | 1 |
1 | B | 6 | 7 | 3 | 8 |
I am trying to do this to clean my data ready for machine learning. Test_C and Test_D were added to show that there are more than just 2 types of tests and it differs depending on the product what test is carried out.
I have tried to do it using the Pandas.get_dummies method but was wondering if there was a cleaner way to do this.
Advertisement
Answer
Use pivot()
:
df.pivot(index = 'Product', columns = 'Test', values = 'Score')
Returns:
Product Comfort Protection A 6 5 B 7 6
If you want to have numerical index or keep ‘Product’ as a column instead of index, add reset_index()
:
df.pivot(index = 'Product', columns = 'Test', values = 'Score').reset_index()
Returns:
Product Comfort Protection 0 A 6 5 1 B 7 6