I have a problem with pandas and replacing values. I have a table with animals and their alleles looks that: And I need to change values of Top1 and Top2 by a specific Key for each row. For example: if values in same rows will be = C & A, I will replace it to A & B, if row

How to replace values in pandas data frame by dictionary?

I have a problem with pandas and replacing values. I have a table with animals and their alleles looks that:

Name   User A1_Top  A2_Top
stefan1 721    A    C   
stefan2 721    A    G
stefan3 331    T    T   
stefan4 331    C    G
stefan5 331    A    A
stefan6 721    G    G

JavaScript
​x
 
Name   User A1_Top  A2_Top
stefan1 721    A    C   
stefan2 721    A    G
stefan3 331    T    T   
stefan4 331    C    G
stefan5 331    A    A
stefan6 721    G    G   
​

And I need to change values of Top1 and Top2 by a specific Key for each row.

For example: if values in same rows will be = C & A, I will replace it to A & B, if row == TT will be BB etc. (key is in if/else below).

I got an answer in another post how to do it by dictionary but I can’t handle with that double condition (if it will be one condition for ex. if A in first row, replace to B, it will be ok). So I just put it into if/else loop and it works… I mean worked, it worked until files were large. Now it soo slow. 300mb file can be processing 30 min on a standard desktop.

That’s how my code look now:

def ATCG_to_AB(df):
x = 0
for i in range(lenFor):
    if df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='C':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='T':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'C' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='A':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='A'
    elif df['A1_TOP'].iloc[i] == 'C' and df['A2_TOP'].iloc[i] =='C':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'G' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'T' and df['A2_TOP'].iloc[i] =='T':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    else:
        print(x,". Something is wrong in line: ", i)
        x+=1

JavaScript
 
def ATCG_to_AB(df):
x = 0
for i in range(lenFor):
    if df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='C':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='T':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'C' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'A' and df['A2_TOP'].iloc[i] =='A':
        df['A1_TOP'].iloc[i] = 'A'
        df['A2_TOP'].iloc[i] ='A'
    elif df['A1_TOP'].iloc[i] == 'C' and df['A2_TOP'].iloc[i] =='C':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'G' and df['A2_TOP'].iloc[i] =='G':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    elif df['A1_TOP'].iloc[i] == 'T' and df['A2_TOP'].iloc[i] =='T':
        df['A1_TOP'].iloc[i] = 'B'
        df['A2_TOP'].iloc[i] ='B'
    else:
        print(x,". Something is wrong in line: ", i)
        x+=1
​

X – counting errors. And I know, that code is ugly, so I tried to put the dictionary method. My attempt:

L = [('A', 'C', 'A', 'B'),('A', 'G', 'A', 'B'),('A', 'T', 'A', 'B'),
 ('C', 'G', 'A', 'B'),('A', 'A', 'A', 'A'),('C', 'C', 'B', 'B'),
 ('G', 'G', 'B', 'B'),('T', 'T', 'B', 'B')]


for x in L:
    a.loc[(df[2] == x[0]) & (df[3] == x[1]), [2,3]] = [x[2], x[3]]

JavaScript
 
L = [('A', 'C', 'A', 'B'),('A', 'G', 'A', 'B'),('A', 'T', 'A', 'B'),
 ('C', 'G', 'A', 'B'),('A', 'A', 'A', 'A'),('C', 'C', 'B', 'B'),
 ('G', 'G', 'B', 'B'),('T', 'T', 'B', 'B')]
​
​
for x in L:
    a.loc[(df[2] == x[0]) & (df[3] == x[1]), [2,3]] = [x[2], x[3]]
​

But I got a bad output. The only A1_top is changed, and usually it a bad symbol. Can someone help me translate my ugly code to the dictionary and explain it? And do I think properly that will be a faster solution?

For sure, expecting output (there are no headers in output, below for clarity)

name   User A1_Top  A2_Top
    stefan1 721    A    B   
    stefan2 721    A    B   
    stefan3 331    B    B   
    stefan4 331    A    B
    stefan5 331    A    A
    stefan6 721    B    B

JavaScript
 
name   User A1_Top  A2_Top
    stefan1 721    A    B   
    stefan2 721    A    B   
    stefan3 331    B    B   
    stefan4 331    A    B
    stefan5 331    A    A
    stefan6 721    B    B   
​

Answer

A simple trick, not to say the best, but it works: [Create a dummy column to do mapping or df[col].apply ]

df['combined'] = df['A1_Top']+"|"+df['A2_Top']

  A1_Top A2_Top combined
0      A      C      A|C
1      A      G      A|G
2      T      T      T|T
3      C      G      C|G
4      A      A      A|A
5      G      G      G|G

JavaScript
 
df['combined'] = df['A1_Top']+"|"+df['A2_Top']
​
  A1_Top A2_Top combined
0      A      C      A|C
1      A      G      A|G
2      T      T      T|T
3      C      G      C|G
4      A      A      A|A
5      G      G      G|G
​

Create dictionary, mapping all your requirements: I give 1 here

map_dict = {}
map_dict['A|C'] = 'B|C'
.
.
.

df['new_values'] = df['combined'].apply(lambda x:map_dict[x] if x in map_dict.keys() else x)

  A1_Top A2_Top combined new_values
0      A      C      A|C        B|C
1      A      G      A|G        A|G
2      T      T      T|T        T|T
3      C      G      C|G        C|G
4      A      A      A|A        A|A
5      G      G      G|G        G|G

df['new_a1_top'] = df['new_values'].apply(lambda x: x.split('|')[0])
df['new_a2_top'] = df['new_values'].apply(lambda x: x.split('|')[1])

  A1_Top A2_Top combined new_values new_a1_top new_a2_top
0      A      C      A|C        B|C          B          C
1      A      G      A|G        A|G          A          G
2      T      T      T|T        T|T          T          T
3      C      G      C|G        C|G          C          G
4      A      A      A|A        A|A          A          A
5      G      G      G|G        G|G          G          G

JavaScript
 
map_dict = {}
map_dict['A|C'] = 'B|C'
.
.
.
​
df['new_values'] = df['combined'].apply(lambda x:map_dict[x] if x in map_dict.keys() else x)
​
  A1_Top A2_Top combined new_values
0      A      C      A|C        B|C
1      A      G      A|G        A|G
2      T      T      T|T        T|T
3      C      G      C|G        C|G
4      A      A      A|A        A|A
5      G      G      G|G        G|G
​
df['new_a1_top'] = df['new_values'].apply(lambda x: x.split('|')[0])
df['new_a2_top'] = df['new_values'].apply(lambda x: x.split('|')[1])
​
  A1_Top A2_Top combined new_values new_a1_top new_a2_top
0      A      C      A|C        B|C          B          C
1      A      G      A|G        A|G          A          G
2      T      T      T|T        T|T          T          T
3      C      G      C|G        C|G          C          G
4      A      A      A|A        A|A          A          A
5      G      G      G|G        G|G          G          G
​

Advertisement

Answer