I have a dataframe that shows order listing. How do I use it to find the number of orders that contain spicy food?
Order Item Name Spicy 0 1000 Calamari Rings No 1 1001 Cheesecake No 2 1001 Spicy Chicken Yes 3 1002 Spicy Lamb Yes
Using this code gives me 2 Yes and 2 No, but it should actually be 2 Yes and 1 No as order 1001 is duplicated. Thank you.
dd = df_orders.groupby("Item Name")["Order"].count()
I would like to get an output that shows the number of orders that contains spicy food.
Spicy 0 Yes 2 1 No 1
Advertisement
Answer
There’s perhaps a simpler way, but this works.
First group by order and spicy counts, to get the count of spicy for each order. Then sort by spicy and drop duplicates by order number (removes the ‘No’ in spicy column if a yes exists for that order). Then group by Spicy again and count to get the counts.
df_orders = pd.DataFrame({'Order' : [1000, 1001, 1001, 1002], 'Item Name' : ['Calamari Rings', 'Cheesecake', 'Spicy Chicken', 'Spicy Lamb'], 'Spicy' : ['No', 'No', 'Yes', 'Yes']}) df_grouped = df_orders.groupby(['Order', 'Spicy']).count().reset_index() df_grouped = df_grouped.sort_values(by='Spicy').drop_duplicates(subset='Order', keep='last') df_grouped = df_grouped.groupby('Spicy').count()['Order'].reset_index()
Output:
Spicy Order 0 No 1 1 Yes 2