I converted a column to list from a pandas df:
JavaScript
x
2
1
subsectors = df['subsectors'].tolist()
2
I wanted to separate this kind of strings: ‘BuyMeADrink’ into ‘Buy Me A Drink’
So I used one of the following:
JavaScript
1
2
1
[' '.join(re.findall('[A-Z][^A-Z]*', s)) for s in subsectors]
2
or
JavaScript
1
3
1
li = re.compile(r'(?<=[a-z])(?=[A-Z])')
2
strings = [li.sub(' ', subsectors) for string in subsectors]
3
or
JavaScript
1
4
1
output=[]
2
for i in subsectors:
3
output.append(" ".join(re.findall('[A-Z][^A-Z]*', i)))
4
All of the above returned this:
TypeError: expected string or bytes-like object
I understand that findall() needs strings not list, but here I am iterating over a list that returns strings, why I get this error then?
Thank you.
Advertisement
Answer
Let’s try replace
:
JavaScript
1
4
1
df = pd.DataFrame({'subsectors':['BuyMeADrink' ]})
2
3
df['subsectors'].str.replace('([A-Z][a-z]*)',r' 1').str.strip()
4
Output:
JavaScript
1
3
1
0 Buy Me A Drink
2
Name: subsectors, dtype: object
3
However, your problem is inherently ambiguous, e.g. how should you split 'ElectionInTheUSA'