Remove rows in which string contains other letters than A,C,T,G,N

Question

I&#8217;m fairly new to numpy and pandas, let&#8217;s say that I have a 2D numpy array and I need to delete all rows in which the second value contain only the letters &#8216;A&#8217;, &#8216;C&#8217;, &#8216;T&#8217;, &#8216;G&#8217; and &#8216;N&#8217; so after filtering I can get this I wanted to do 3 for …

Accepted Answer

Use Series.str.contains with values and ^ for start and $ for end of string:file = [['id', 'genome'], ['0', 'ATGTTTGTTTTT'], ['1', 'ATGTTTGTXXXX'], ['2', 'ATGDD2GTTTTT']] df = pd.DataFrame(file[1:], columns=file[0])print (df)df = df[df['genome'].str.contains('^[ACTGN]+$')]print (df)  id        genome0  0  ATGTTTGTTTTT

Advertisement

Answer