Skip to content
Advertisement

Replacing values greater than a number in pandas dataframe

I have a large dataframe which looks as:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [33, 34, 39]
2017-01-01 03:00:00    [3, 43, 9]

I want to replace each element greater than 9 with 11.

So, the desired output for above example is:

df1['A'].ix[1:3]
2017-01-01 02:00:00    [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Edit:

My actual dataframe has about 20,000 rows and each row has list of size 2000.

Is there a way to use numpy.minimum function for each row? I assume that it will be faster than list comprehension method?

Advertisement

Answer

You can use apply with list comprehension:

df1['A'] = df1['A'].apply(lambda x: [y if y <= 9 else 11 for y in x])
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]

Faster solution is first convert to numpy array and then use numpy.where:

a = np.array(df1['A'].values.tolist())
print (a)
[[33 34 39]
 [ 3 43  9]]

df1['A'] = np.where(a > 9, 11, a).tolist()
print (df1)
                                A
2017-01-01 02:00:00  [11, 11, 11]
2017-01-01 03:00:00    [3, 11, 9]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement