Find string in data frame and store new values in a new column

Question

I am creating a script that takes a csv file which columns organisation and columns name are unknown. However I know that only one of the column contains some values in which the str &#8216;rs&#8217; and &#8216;del&#8217; appears. I need to create an extra column (called &#8216;Type&#8217;) and store &#8216;d…

Accepted Answer

You can do it without the loop. Here&#8217;s an approach. You can use applymap and search all the columns.import pandas as pddata = {'Number': ['Mukul', 'Rohan', 'Mayank',                   'Shubham', 'Aakash'],                   'Location': ['Saharsanpur', 'MERrs', 'rsAdela',                      'aaaadelaa', 'aaa'],                   'Pay': [25000, 30000, 35000, 40000, 45000]}   df = pd.DataFrame(data)df['rs'] = df.astype(str).applymap(lambda x: 'rs' in x).any(1)df['del'] = df.astype(str).applymap(lambda x: 'del' in x).any(1)df['type']=''df.loc[df['rs'] == True, 'type'] = 'dbsnp'df.loc[df['del'] == True, 'type'] = 'deletion'df = df.drop(columns=['rs','del'])print (df)Based on the data in the table, rsAdela has both rs and del. Since I am applying rs first and del second, the row is flagged for deletion. You can choose to swap the order to decide if you want to retain value as dbsnp or deletion.The code processes all the columns irrespective of dtype.The output of the above data is:    Number     Location    Pay      type0    Mukul  Saharsanpur  25000     dbsnp1    Rohan        MERrs  30000     dbsnp2   Mayank      rsAdela  35000  deletion3  Shubham    aaaadelaa  40000  deletion4   Aakash          aaa  45000

Advertisement

Answer