I’m new to programming, I’m working on a python project using pandas I wanted to change values of each row of a dataset using .loc, but it seems like it won’t work, the idea is to make a row take EOL value if the row is equal to 0, the code doesn’t bring an error, but my dataset is unchanged after the iterations. Here is the code:
for machines in telemetry_days['machineID']: EOL = 365 i = 0 for row in telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)]: if (row != 0): EOL = row elif (row == 0): telemetry_days['failure_comp1'].loc[(telemetry_days['machineID'] == machines)].iloc[i] = EOL i = i + 1
I think it’s because i’m using .iloc so it won’t change the value of ‘failure_comp1’ in the dataset. But I can’t figure out how to get a specific row from .loc without using .iloc., if anyone as any suggestions I’d be very grateful, thanks Here is the structure of the whole dataset (don’t mind the NaNs): enter image description here Here is what i have for example (for one ‘machine’):
index failure_comp1 67 0 254 150 568 0 850 0 998 345
I want it to become this:
index failure_comp1 67 365 254 150 568 150 850 150 998 345
It’s a time series dataset and i want to label each component of machines with it’s End Of Life time (number of days), i’ve already got it labeled at the date where it fails, but I want to have it labeled for each row of that specific component.
Advertisement
Answer
So I wouldn’t iterate through the rows (although you could if you want, I’ll show that solution too). But what I would do is use a .groupby('macineID')
. 1) Then convert all the 0s to nan. 2) forward fill the nans. 3) this will leave the first 0 as a nan, so finally fillna with 365.
Given as a sample data set:
import pandas as pd telemetry_days = pd.DataFrame({ 'machineID':['11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44','11','22','33','44'], 'failure_comp1':[0,2,45,0, 150,150,232,0, 0, 0, 0, 0, 0, 12, 0, 0, 345, 12, 0, 0]})
Code:
import pandas as pd import numpy as np telemetry_days['failure_comp1'] = telemetry_days['failure_comp1'].replace(0, np.nan) telemetry_days['failure_comp1'] = telemetry_days.groupby('machineID', as_index=False)['failure_comp1'].ffill().fillna(365)
If you want to use the .loc or .iloc:
Here’s how I would do it. I would loop through each unique machineID, filter the dataframe to get just those machines, then iterrate through that sub-group. I also would not hard code the i
(index). .iteritems()
and or iterrows()
will returns the index value for you, so just use that.
for machines in telemetry_days['machineID'].unique(): EOL = 365 for i, row in telemetry_days[telemetry_days['machineID'] == machines]['failure_comp1'].iteritems(): if (row != 0): EOL = row elif (row == 0): telemetry_days['failure_comp1'].iloc[i] = EOL