I’m trying to loop through a table that contains covid-19 data. My table has 4 columns: month, day, location, and cases. The values of each column in the table is stored in its own list, so each list has the same length. (Ie. there is a month list, day list, location list, and cases list). There are 12 months, with up to 31 days in a month. Cases are recorded for many locations around the world. I would like to figure out what day of the year had the most total combined global cases. I’m not sure how to structure my loops appropriately. An oversimplified sample version of the table represented by the lists is shown below.
In this small example, the result would be month 1, day 3 with 709 cases (257 + 452).
Month | Day | Location | Cases |
---|---|---|---|
1 | 1 | CAN | 124 |
1 | 1 | USA | 563 |
1 | 2 | CAN | 242 |
1 | 2 | USA | 156 |
1 | 3 | CAN | 257 |
1 | 3 | USA | 452 |
. | . | … | … |
12 | 31 | … | … |
Advertisement
Answer
You group your dataframe by month and day. Then iterate through the groups to find the group in which the sum of cases in all locations was max as shown below:
import pandas as pd df = pd.DataFrame({'Month':[1,1,1,1,1,1], 'Day':[1,1,2,2,3,3], 'Location':['CAN', 'USA', 'CAN', 'USA','CAN', 'USA'], 'Cases':[124,563,242,156,257,452]}) grouped = df.groupby(['Month', 'Day']) max_sum = 0 max_day = None for idx, group in grouped: if group['Cases'].sum() > max_sum: max_sum = group['Cases'].sum() max_day = group month = max_day['Month'].iloc[1] day = max_day['Day'].iloc[1] print(f'Maximum cases of {max_sum} occurred on {month}/{day}.') #prints: Maximum cases of 709 occurred on 1/3
If you don’t want to use Pandas, this is how you do it:
months = [1,1,1,1,1,1] days = [1,1,2,2,3,3] locations = ['CAN', 'USA', 'CAN', 'USA','CAN', 'USA'] cases = [124,563,242,156,257,452] dic = {} target_day = 0 count = 0 for i in range(len(days)): if days[i] != target_day: target_day = days[i] count = cases[i] else: count += cases[i] dic[f'{months[i]}/{days[i]}'] = count max_cases = max(dic.values()) worst_day = list(dic.keys())[list(dic.values()).index(max_cases)] print(f'Maximum cases of {max_cases} occurred on {worst_day}.') #Prints: Maximum cases of 709 occurred on 1/3.