How to convert rows to columns in a Pandas groupby?

I have a table containing price data for a set of products over 6 months. Each product has a unique id (sku_id) and can be from size 6-12. We measured the price each day, and generated a table similar to the example below. Source indicates what website the price was on (can be 1-4).

| sku_id | size | price | timestamp | source |

|    1   | 6.0  | 115.0 | 2021-01-10|   1    |
|    2   | 8.0  | 149.0 | 2021-01-10|   1    |
|    1   | 6.0  | 168.0 | 2021-01-10|   2    |
|    1   | 9.0  | 152.0 | 2021-01-10|   1    |
 ... 
|    1   | 9.0  | 152.0 | 2021-07-10|   1    |

JavaScript
​x
 
| sku_id | size | price | timestamp | source |
​
|    1   | 6.0  | 115.0 | 2021-01-10|   1    |
|    2   | 8.0  | 149.0 | 2021-01-10|   1    |
|    1   | 6.0  | 168.0 | 2021-01-10|   2    |
|    1   | 9.0  | 152.0 | 2021-01-10|   1    |
 ... 
|    1   | 9.0  | 152.0 | 2021-07-10|   1    |
​

Now, I want to perform some analytics/modelling on the above dataset, but I’m having trouble formatting it as needed. I want one row for each product from each site (i.e. the key will be [sku_id, size, source]), where we get the price for each day from January 10th, to July 1st in a new column. This is represented in the table below.

| sku_id | size | source |price on 2021-01-10|price on 2021-01-11|price on 2021-01-12| ... |

|    1   | 6.0  |   1    |        149.0      |     151.2         |     158.2         | ... |
|    2   | 6.0  |   1    |        142.0      |     221.8         |     312.9         | ... |
 ...

JavaScript
 
| sku_id | size | source |price on 2021-01-10|price on 2021-01-11|price on 2021-01-12| ... |
​
|    1   | 6.0  |   1    |        149.0      |     151.2         |     158.2         | ... |
|    2   | 6.0  |   1    |        142.0      |     221.8         |     312.9         | ... |
 ...
​

I’m trying to do this in Pandas using a group by (grouping by sku_id, size, and source), but I’m not sure how to instruct Pandas to tack a new column on for each day. Does anyone have instructions/advice on how to build this second table from the first one? I am not quite sure how to get the group by to work, and I don’t know what other strategies could work either.

Answer

Use pivot to transform your dataframe:

# pivot args: 1st -> index, 2nd -> columns, 3rd -> values
>>> df.pivot(['sku_id', 'size', 'source'], 'timestamp', 'price') 
      .rename_axis(columns=None).add_prefix('price on ').reset_index()

   sku_id  size  source  price on 2021-01-10  price on 2021-07-10
0       1   6.0       1                115.0                  NaN
1       1   6.0       2                168.0                  NaN
2       1   9.0       1                152.0                152.0
3       2   8.0       1                149.0                  NaN

JavaScript
 
# pivot args: 1st -> index, 2nd -> columns, 3rd -> values
>>> df.pivot(['sku_id', 'size', 'source'], 'timestamp', 'price') 
      .rename_axis(columns=None).add_prefix('price on ').reset_index()
​
   sku_id  size  source  price on 2021-01-10  price on 2021-07-10
0       1   6.0       1                115.0                  NaN
1       1   6.0       2                168.0                  NaN
2       1   9.0       1                152.0                152.0
3       2   8.0       1                149.0                  NaN
​

Advertisement

Answer