Skip to content
Advertisement

Find value from string using the characters from list Using Python

I have been working on an Excel sheet using python, where i have to extract only the specific value from the column, using an list with set of charaters.

Need to check every character from the column check with the list, If it matches need to return the matched value into the dataframe which can be used for further analysis.

Input Data :

          text-value

19 Freezeland Lane, United Kingdom BD23 0UN
44 Bishopthorpe Road, United States LL55 1EU
Worthy Lane Denmark, LN11 9LP
88 Carriers Road, Mexico , DG3 1LB
HongKong

Expected Output:

text_value

United Kingdom
United States
Denmark
Mexico
HongKong

Code Snippet:

import pandas as pd
import re
countries=['United Kingdom','Denmark','India','United States','Mexico','HongKong']

df['text_value'] = re.findall(countries, df.text_value)

But It didn’t worked Also Tried :

if re.compile('|'.join(countries),re.IGNORECASE).search(df['text_value']): 
    df['text_value']

Advertisement

Answer

You can use

df['country_list'] = df['text_value'].str.findall(r'(?i)b(?:{})b'.format('|'.join(countries)))

Here, Series.str.findall returns all matches found in each cell in the country_list column, and the pattern, that looks like (?i)b(?:Country1|Country2|...)b, matches

  • (?i) – case insensitive inline modifier option
  • b – a word boundary
  • (?:Country1|Country2|...) – a list of countries
  • b – a word boundary
Advertisement