Skip to content
Advertisement

All cells getting updated in pandas df using loc

So I create an empty pandas df, where I initialize all the cell values to empty lists, except the diagonals, which are set to math.inf

The indexes are the start position, and the column headers are the end position

I want to get the start and end positions, and the difference between the days to get from start to end, and put that value in df.loc[start, end] by using append. But for some reason, every single cell in the df is getting updated, and i dont know why

My code is shown below

self.status_dict = {'nvc': 'At NVC',
                            'issued': 'Issued',
                            'ready': 'Ready',
                            'ar_ds260': 'Action required: Complete Form DS-260',
                            'transit': 'In Transit',
                            'refused': 'Refused',
                            'ar_doc': 'Action required: Submit requested documents',
                            'admin_process': 'Administrative Processing', 
                            'expire_soon': 'Expiring Soon',
                            'app_received': 'Application Received',
                            'ar_pay': 'Action required: Pay fees',
                            'return_nvc': 'Returned to NVC',
                            'transfer': 'Transfer in Progress',
                            'expired': 'Expired',
                            'ar_pay_miss': 'Action required: Pay missing fees', 
                            'no_action': 'No action required: Review in process', 
                            'no_status': 'No Status',
                            'ar_choose': 'Action required: Choose an agent'
                           }
self.status_dict_lookup = {'At NVC': 'nvc', 
                                   'Issued': 'issued', 
                                   'Ready': 'ready', 
                                   'Action required: Complete Form DS-260': 'ar_ds260', 
                                   'In Transit': 'transit', 
                                   'Refused': 'refused', 
                                   'Action required: Submit requested documents': 'ar_doc', 
                                   'Administrative Processing': 'admin_process', 
                                   'Expiring Soon': 'expire_soon', 
                                   'Application Received': 'app_received', 
                                   'Action required: Pay fees': 'ar_pay', 
                                   'Returned to NVC': 'return_nvc', 
                                   'Transfer in Progress': 'transfer', 
                                   'Expired': 'expired', 
                                   'Action required: Pay missing fees': 'ar_pay_miss', 
                                   'No action required: Review in process': 'no_action', 
                                   'No Status': 'no_status', 
                                   'Action required: Choose an agent': 'ar_choose'
                                  }

shape = len(self.status_dict_lookup)
        const_arr = [[]] * shape
        keys = self.status_dict.keys()
        df_dict = dict()
        for key in keys:
            df_dict[key] = const_arr
        df = pd.DataFrame(df_dict)
        df = df.set_index(pd.Index(keys))
        for key in keys:
            df.loc[key, key] = math.inf
        
#         cases = self.cases
        cases = {1044: [['Action required: Submit requested documents', '2021-12-18'], 
                        ['At NVC', '2022-02-03'], ['In Transit', '2022-02-14'], 
                        ['Ready', '2022-02-15'], ['Refused', '2022-03-10'], 
                        ['Administrative Processing', '2022-03-12'], ['Issued', '2022-03-14']]}
        for _, val in cases.items():
            print(val[0], val[1])
            print(val[0][1], val[1][1])
            for i in range(len(val) - 1):
                temp = []
                start = val[i][0]
                end   = val[i + 1][0]
                
                start_time = datetime.strptime(val[i][1], '%Y-%m-%d')
                end_time = datetime.strptime(val[i + 1][1], '%Y-%m-%d')
                diff = end_time - start_time
                
                temp = df[self.status_dict_lookup[start]][self.status_dict_lookup[end]]
                print(temp)
                temp.append(diff.days)
                df.loc[self.status_dict_lookup[start], self.status_dict_lookup[end]] = temp

part of the output of the df is shown below:

nvc                 issued  
nvc                              inf  [47, 11, 1, 23, 2, 2]   
issued         [47, 11, 1, 23, 2, 2]                    inf   
ready          [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_ds260       [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
transit        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
refused        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_doc         [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
admin_process  [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
expire_soon    [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
app_received   [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_pay         [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
return_nvc     [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
transfer       [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
expired        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_pay_miss    [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
no_action      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
no_status      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_choose      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2] 

So for the first example,

start = Action required: Submit requested documents

end = At NVC

diff = 47

So i want it to store just 47 as a list in df[ar_doc][nvc]. But it is storing the difference of all days in all the cells

Why does this happen and how to fix it?

Advertisement

Answer

All your pandas data are referencing the same list. You should change how you initialize the DataFrame. You should create a new list in each cell.

Try:

df =  pd.DataFrame({k: [list() for _ in range(len(status_dict))] for k in status_dict}, 
                   index=status_dict.keys())
for key in keys:
    df.at[key, key] = math.inf

Separately, since you’re already using pandas, you don’t need to use datetime to parse dates. You can reduce your loop to the following:

for _, val in cases.items():
    for i in range(len(val)-1):
        diff = pd.to_datetime(val[i][1], format='%Y-%m-%d') - pd.to_datetime(val[i + 1][1], format='%Y-%m-%d')
        df.at[status_dict_lookup[val[i][0]], status_dict_lookup[val[i+1][0]]] += [diff.days]
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement