Skip to content
Advertisement

I cannot change the values of a column with specific condition

The table looks like the following:

text dummy1 days
op123ac 1 2000-01-01
op123ac 0 2000-01-04
op123ac 0 2000-01-07
op123ac 0 2000-01-10
op1248ab 0 2000-01-17
op1248ab 1 2000-01-20
op1248ab 1 2000-01-23
op1248ab 1 2000-01-26

Each unique “text” have four repeated values correspond to four unique “days”. “days” are consecutive for each “text”. The problem is that each “text” must have one unique “dummy1”, so it must be 1 or 0 and :

df.groupby("text")['dummy'].sum() # Each "text" should get either 0 or 4, no other value are accept.

I figured the count of error “dummy1” are less than the correct one. For example, “op123ac” has 1 and 0 in the “dummy1” column, count of 0(3) is much more than the count of 1(1) so the correct “dummy1” should be 0. “op1248ab” has more 1s than 0, so it’s correct value should be 1. The correct table should look like the following:

text dummy1 days
op123ac 0 2000-01-01
op123ac 0 2000-01-04
op123ac 0 2000-01-07
op123ac 0 2000-01-10
op1248ab 1 2000-01-17
op1248ab 1 2000-01-20
op1248ab 1 2000-01-23
op1248ab 1 2000-01-26

There is no way for number of 0s equal to number of 1s for any “text” value.

Advertisement

Answer

 df['dummy1'] = df.groupby('text')['dummy1'].transform(lambda x: x.mode().iat[0])

Output

       text  dummy1        days
0   op123ac       0  2000-01-01
1   op123ac       0  2000-01-04
2   op123ac       0  2000-01-07
3   op123ac       0  2000-01-10
4  op1248ab       1  2000-01-17
5  op1248ab       1  2000-01-20
6  op1248ab       1  2000-01-23
7  op1248ab       1  2000-01-26

The mode of a set of values is the value that appears most often.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement