Skip to content
Advertisement

Error re.findall() when used in a iteration over a list from pandas dataframe

I converted a column to list from a pandas df:

    subsectors = df['subsectors'].tolist()

I wanted to separate this kind of strings: ‘BuyMeADrink’ into ‘Buy Me A Drink’

So I used one of the following:

    [' '.join(re.findall('[A-Z][^A-Z]*', s)) for s in subsectors]

or

    li = re.compile(r'(?<=[a-z])(?=[A-Z])')
    strings = [li.sub(' ', subsectors) for string in subsectors]

or

    output=[]
    for i in subsectors:
        output.append(" ".join(re.findall('[A-Z][^A-Z]*', i)))

All of the above returned this:

TypeError: expected string or bytes-like object

I understand that findall() needs strings not list, but here I am iterating over a list that returns strings, why I get this error then?

Thank you.

Advertisement

Answer

Let’s try replace:

df = pd.DataFrame({'subsectors':['BuyMeADrink' ]})

df['subsectors'].str.replace('([A-Z][a-z]*)',r' 1').str.strip()

Output:

0    Buy Me A Drink
Name: subsectors, dtype: object

However, your problem is inherently ambiguous, e.g. how should you split 'ElectionInTheUSA'

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement