I have been working on an Excel sheet using python, where i have to extract only the specific value from the column, using an list with set of charaters.
Need to check every character from the column check with the list, If it matches need to return the matched value into the dataframe which can be used for further analysis.
Input Data :
text-value 19 Freezeland Lane, United Kingdom BD23 0UN 44 Bishopthorpe Road, United States LL55 1EU Worthy Lane Denmark, LN11 9LP 88 Carriers Road, Mexico , DG3 1LB HongKong
Expected Output:
text_value United Kingdom United States Denmark Mexico HongKong
Code Snippet:
import pandas as pd import re countries=['United Kingdom','Denmark','India','United States','Mexico','HongKong'] df['text_value'] = re.findall(countries, df.text_value)
But It didn’t worked Also Tried :
if re.compile('|'.join(countries),re.IGNORECASE).search(df['text_value']): df['text_value']
Advertisement
Answer
You can use
df['country_list'] = df['text_value'].str.findall(r'(?i)b(?:{})b'.format('|'.join(countries)))
Here, Series.str.findall
returns all matches found in each cell in the country_list
column, and the pattern, that looks like (?i)b(?:Country1|Country2|...)b
, matches
(?i)
– case insensitive inline modifier optionb
– a word boundary(?:Country1|Country2|...)
– a list of countriesb
– a word boundary