I asked a similar question a few days ago on here and it was great help! A new challenge I wanted build is to further develop the regex pattern to look for specific formats in this iteration, and I thought I have solved it using regex 101 to build/test a regex code but when applied in Python received ‘pattern contain no group’. Below is a test df, and a image of what the results should be like/code that was provided via StackOverflow that worked with digits only.
df = pd.DataFrame([["{1} | | Had a Greeter welcome clients {1.0} | | Take measures to ensure a safe and organized distribution {1.000} | | Protected confidentiality of clients (on social media, pictures, in conversation, own congregation members receiving assistance, etc.)", "{1.00} | | Chairs for clients to sit in while waiting {1.0000} | | Take measures to ensure a safe and organized distribution"], ["{1 } | Financial literacy/budgeting {1 } | | Monetary/Bill Support {1} | | Mental Health Services/Counseling", "{1}| | Clothing Assistance {1 } | | Healthcare {1} | | Mental Health Services/Counseling {1} | | Spiritual Support {1} | | Job Skills Training"] ] , columns = ['CF1', 'CF2'])
Here is the iteration code that worked digits only. I changed the pattern search with my new regex pattern and it did not work.
Original code: (df.stack().str.extractall('(d+)')[0] .groupby(level=[0,1]).sum().unstack())
New Code (Failed to recognize pattern): (df.stack().str.extractall(r'(?<={)[d+. ]+(?=})')[0].astype(int) .groupby(level=[0,1]).sum().unstack())
**In the test df you will see I want to only capture the numbers between “{}” and there’s a mixture of decimals and spaces following the number I want to capture and sum. The new pattern did not work in application so any help would be great! **
Advertisement
Answer
Your (?<={)[d+. ]+(?=})
regex contains no capturing groups while Series.str.extractall
requires at least one capturing group to output a value.
You need to use
(df.stack().str.extractall(r'{s*(d+(?:.d+)?)s*}')[0].astype(float) .groupby(level=[0,1]).sum().unstack())
Output:
CF1 CF2 0 3.0 2.0 1 3.0 5.0
The {s*(d+(?:.d+)?)s*}
regex matches
{
– a{
chars*
– zero or more whitespaces(d+(?:.d+)?)
– Group 1 (note this group captured value will be the output of theextractall
method, it requires at least one capturing group): one or more digits, and then an optional occurrence of a.
and one or more digitss*
– zero or more whitespaces}
– a}
char.
See the regex demo.