I have a df like this one:
df = pd.DataFrame([["coffee","soda","coffee","water","soda","soda"],["paper","glass","glass","paper","paper","glass"], list('smlssm')]).T df.columns = ['item','cup','size']
df:
item cup size 0 coffee paper s 1 soda glass m 2 coffee glass l 3 water paper s 4 soda paper s 5 soda glass m
I want to transform this into a df that looks like this
item cup size freq 0 coffee paper s 1 1 coffee paper m 0 2 coffee paper l 0 3 coffee glass s 0 4 coffee glass m 0 5 coffee glass l 1 6 soda paper s 1 7 soda paper m 0 8 soda paper l 0 9 soda glass s 0 10 soda glass m 2 11 soda glass l 0 . . . . . . . . . . . . . . .
So for every item i want a row with the possible combinations of cup and size and an additional row with the frequency.
What is the proper way to do this using pandas?
Advertisement
Answer
Let’s try:
Add a frequency column to the dataframe to indicate individual rows are worth 1 each.
groupby sum
to get the current count in the DataFrame.Create a MultiIndex from the
unique
values in each column.Use the new
midx
toreindex
with afill_value=0
so that freq gets filled with 0 when created by the new index.reset_index
to convert the index back into columns.
# Columns to Reindex idx_cols = ['item', 'cup', 'size'] # Create MultIndex With Unique Values midx = pd.MultiIndex.from_product( [df[c].unique() for c in idx_cols], names=idx_cols ) df = ( df.assign(freq=1) # Add Freq Column initialzed to 1 .groupby(idx_cols)['freq'].sum() # Groupby and Sum freq .reindex(midx, fill_value=0) # reindex .reset_index() # reset_index )
df
:
item cup size freq 0 coffee paper s 1 1 coffee paper m 0 2 coffee paper l 0 3 coffee glass s 0 4 coffee glass m 0 5 coffee glass l 1 6 soda paper s 1 7 soda paper m 0 8 soda paper l 0 9 soda glass s 0 10 soda glass m 2 11 soda glass l 0 12 water paper s 1 13 water paper m 0 14 water paper l 0 15 water glass s 0 16 water glass m 0 17 water glass l 0