Skip to content
Advertisement

Clean Data using Pandas

I have the data in this format, and I want to turn it around in an horizontal shape, and in the same time accumulate the count of one product, when the date and hour data are the same.

I put below the new desired data frame. Is this feasible using pandas? Or any other python library?

id,  date,   hour,  name,  count
1,   01-20,  6,     car,   4
2,   01-20,  6,     car,   3
3,   01-20,  7,     car,   4
4,   01-20,  7,     car,   2
5,   01-21,  6,     car,   1
6,   01-21,  6,     car,   1
7,   01-21,  7,     boat,  7
8,   01-21,  7,     boat,  8
9,   01-22,  6,     car,   10
10,  01-22,  7,     boat,  11



      01-20(6)  01-20(7)  01-21(6)  01-21(7)  01-22(6)  01-22(7) 
car   4+3=7     4+2=6     1+1=2     0         10        0
boat  0         0         0         7+8=15    0         11

Advertisement

Answer

You could groupby+sum, then unstack, finally rework the column names:

df2 = (df.groupby(['date', 'hour', 'name'])
          ['count'].sum()
         .unstack(['date', 'hour'], fill_value=0)
       )
df2.columns = df2.columns.map(lambda x: f'{x[0]}({x[1]})')

Output:

      01-20(6)  01-20(7)  01-21(6)  01-21(7)  01-22(6)  01-22(7)
name                                                            
boat         0         0         0        15         0        11
car          7         6         2         0        10         0
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement