Skip to content
Advertisement

Changing a cell value in a columns based on a query from two other columns

I have a DataFrame with four columns: “date”, “time_gap”, “count” and “average_speed”.

I’d like to set values to the count column when requirements are met based on the “date” and “time_gap” columns.

So, for example, if I’m running this query:

random_row = df.query("date == '2018-12-07' & time_gap == 86")

It’s returning this as output:

      date         time_gap   count   average_speed
282   2018-12-07   86         0       0

Let’s say I want to change the value in the count columns with 12, how could I do it?

I’ve tried this:

random_row = df.query("date == '2018-12-07' & time_gap == 86")["count"].replace(0, 12)

Which returns this:

282    12
Name: count, dtype: int64

But when I’m having a look at the df:

df.iloc[282]

I still have my row where the “count” is equal to 0:

date             2018-12-07 00:00:00
time_gap                          86
count                              0
average_speed                      0
Name: 282, dtype: object

How can I do it?

Advertisement

Answer

You can do it with loc, if you don’t want to use NumPy:

df.loc[ (df.date.eq('07/12/2018')) & (df.time_gap.eq(86)), 'count' ] = 12

prints:

         date  time_gap  count  average_speed
0  07/12/201
8        86     12              0

Yes, but in order to do that you have to use eval, which takes the expression passed in query, and evaluates it:

qr = "date == '07/12/2018' & time_gap == 86"
df.loc[df.eval(qr), 'count'] = 12

prints:

         date  time_gap  count  average_speed
0  07/12/2018        86     12              0

You can see practical applications of eval here.

Advertisement