Skip to content
Advertisement

pandas change dataset value of a specific row using loc

I’m new to programming, I’m working on a python project using pandas I wanted to change values of each row of a dataset using .loc, but it seems like it won’t work, the idea is to make a row take EOL value if the row is equal to 0, the code doesn’t bring an error, but my dataset is unchanged after the iterations. Here is the code:

for machines in telemetry_days['machineID']:
EOL = 365
i = 0

for row in telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)]:
    
    if (row != 0):
        EOL = row
      
    elif (row == 0):
        telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)].iloc[i] = EOL
    i = i + 1

I think it’s because i’m using .iloc so it won’t change the value of ‘failure_comp1’ in the dataset. But I can’t figure out how to get a specific row from .loc without using .iloc., if anyone as any suggestions I’d be very grateful, thanks Here is the structure of the whole dataset (don’t mind the NaNs): enter image description here Here is what i have for example (for one ‘machine’):

index failure_comp1
67    0
254   150
568   0
850   0
998   345

I want it to become this:

index failure_comp1
67    365
254   150
568   150
850   150
998 345

It’s a time series dataset and i want to label each component of machines with it’s End Of Life time (number of days), i’ve already got it labeled at the date where it fails, but I want to have it labeled for each row of that specific component.

Advertisement

Answer

So I wouldn’t iterate through the rows (although you could if you want, I’ll show that solution too). But what I would do is use a .groupby('macineID'). 1) Then convert all the 0s to nan. 2) forward fill the nans. 3) this will leave the first 0 as a nan, so finally fillna with 365.

Given as a sample data set:

import pandas as pd

telemetry_days = pd.DataFrame({
    'machineID':['11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44'],
    'failure_comp1':[0,2,45,0, 
                     150,150,232,0, 
                     0, 0, 0, 0, 
                     0, 12, 0, 0,
                     345, 12, 0, 0]})

Code:

import pandas as pd
import numpy as np


telemetry_days['failure_comp1'] = telemetry_days['failure_comp1'].replace(0, np.nan)
telemetry_days['failure_comp1'] = telemetry_days.groupby('machineID', as_index=False)['failure_comp1'].ffill().fillna(365)

If you want to use the .loc or .iloc:

Here’s how I would do it. I would loop through each unique machineID, filter the dataframe to get just those machines, then iterrate through that sub-group. I also would not hard code the i (index). .iteritems() and or iterrows() will returns the index value for you, so just use that.

for machines in telemetry_days['machineID'].unique():
    EOL = 365
   
    for i, row in telemetry_days[telemetry_days['machineID'] == machines]['failure_comp1'].iteritems():
        
        if (row != 0):
            EOL = row
          
        elif (row == 0):
            telemetry_days['failure_comp1'].iloc[i] = EOL
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement