I have been working on an Excel sheet using python, where i have to extract only the specific value from the column, using an list with set of charaters.
Need to check every character from the column check with the list, If it matches need to return the matched value into the dataframe which can be used for further analysis.
Input Data :
JavaScript
x
8
1
text-value
2
3
19 Freezeland Lane, United Kingdom BD23 0UN
4
44 Bishopthorpe Road, United States LL55 1EU
5
Worthy Lane Denmark, LN11 9LP
6
88 Carriers Road, Mexico , DG3 1LB
7
HongKong
8
Expected Output:
JavaScript
1
8
1
text_value
2
3
United Kingdom
4
United States
5
Denmark
6
Mexico
7
HongKong
8
Code Snippet:
JavaScript
1
6
1
import pandas as pd
2
import re
3
countries=['United Kingdom','Denmark','India','United States','Mexico','HongKong']
4
5
df['text_value'] = re.findall(countries, df.text_value)
6
But It didn’t worked Also Tried :
JavaScript
1
3
1
if re.compile('|'.join(countries),re.IGNORECASE).search(df['text_value']):
2
df['text_value']
3
Advertisement
Answer
You can use
JavaScript
1
2
1
df['country_list'] = df['text_value'].str.findall(r'(?i)b(?:{})b'.format('|'.join(countries)))
2
Here, Series.str.findall
returns all matches found in each cell in the country_list
column, and the pattern, that looks like (?i)b(?:Country1|Country2|...)b
, matches
(?i)
– case insensitive inline modifier optionb
– a word boundary(?:Country1|Country2|...)
– a list of countriesb
– a word boundary