I cannot change the values of a column with specific condition

The table looks like the following:

text	dummy1	days
op123ac	1	2000-01-01
op123ac	0	2000-01-04
op123ac	0	2000-01-07
op123ac	0	2000-01-10
op1248ab	0	2000-01-17
op1248ab	1	2000-01-20
op1248ab	1	2000-01-23
op1248ab	1	2000-01-26

Each unique “text” have four repeated values correspond to four unique “days”. “days” are consecutive for each “text”. The problem is that each “text” must have one unique “dummy1”, so it must be 1 or 0 and :

df.groupby("text")['dummy'].sum() # Each "text" should get either 0 or 4, no other value are accept.

JavaScript
​x
 
df.groupby("text")['dummy'].sum() # Each "text" should get either 0 or 4, no other value are accept.
​

I figured the count of error “dummy1” are less than the correct one. For example, “op123ac” has 1 and 0 in the “dummy1” column, count of 0(3) is much more than the count of 1(1) so the correct “dummy1” should be 0. “op1248ab” has more 1s than 0, so it’s correct value should be 1. The correct table should look like the following:

text	dummy1	days
op123ac	0	2000-01-01
op123ac	0	2000-01-04
op123ac	0	2000-01-07
op123ac	0	2000-01-10
op1248ab	1	2000-01-17
op1248ab	1	2000-01-20
op1248ab	1	2000-01-23
op1248ab	1	2000-01-26

There is no way for number of 0s equal to number of 1s for any “text” value.

Answer

 df['dummy1'] = df.groupby('text')['dummy1'].transform(lambda x: x.mode().iat[0])

JavaScript
 
 df['dummy1'] = df.groupby('text')['dummy1'].transform(lambda x: x.mode().iat[0])
​

Output

       text  dummy1        days
0   op123ac       0  2000-01-01
1   op123ac       0  2000-01-04
2   op123ac       0  2000-01-07
3   op123ac       0  2000-01-10
4  op1248ab       1  2000-01-17
5  op1248ab       1  2000-01-20
6  op1248ab       1  2000-01-23
7  op1248ab       1  2000-01-26

JavaScript
 
       text  dummy1        days
 op123ac       0  2000-01-01
 op123ac       0  2000-01-04
 op123ac       0  2000-01-07
 op123ac       0  2000-01-10
op1248ab       1  2000-01-17
op1248ab       1  2000-01-20
op1248ab       1  2000-01-23
op1248ab       1  2000-01-26
​

The mode of a set of values is the value that appears most often.

Advertisement

Answer