What I was trying to achieve, was something like this:
>>> camel_case_split("CamelCaseXYZ") ['Camel', 'Case', 'XYZ'] >>> camel_case_split("XYZCamelCase") ['XYZ', 'Camel', 'Case']
So I searched and found this perfect regular expression:
(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])
As the next logical step I tried:
>>> re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ") ['CamelCaseXYZ']
Why does this not work, and how do I achieve the result from the linked question in python?
Edit: Solution summary
I tested all provided solutions with a few test cases:
string: '' AplusKminus: [''] casimir_et_hippolyte: [] two_hundred_success: [] kalefranz: string index out of range # with modification: either [] or [''] string: ' ' AplusKminus: [' '] casimir_et_hippolyte: [] two_hundred_success: [' '] kalefranz: [' '] string: 'lower' all algorithms: ['lower'] string: 'UPPER' all algorithms: ['UPPER'] string: 'Initial' all algorithms: ['Initial'] string: 'dromedaryCase' AplusKminus: ['dromedary', 'Case'] casimir_et_hippolyte: ['dromedary', 'Case'] two_hundred_success: ['dromedary', 'Case'] kalefranz: ['Dromedary', 'Case'] # with modification: ['dromedary', 'Case'] string: 'CamelCase' all algorithms: ['Camel', 'Case'] string: 'ABCWordDEF' AplusKminus: ['ABC', 'Word', 'DEF'] casimir_et_hippolyte: ['ABC', 'Word', 'DEF'] two_hundred_success: ['ABC', 'Word', 'DEF'] kalefranz: ['ABCWord', 'DEF']
In summary you could say the solution by @kalefranz does not match the question (see the last case) and the solution by @casimir et hippolyte eats a single space, and thereby violates the idea that a split should not change the individual parts. The only difference among the remaining two alternatives is that my solution returns a list with the empty string on an empty string input and the solution by @200_success returns an empty list. I don’t know how the python community stands on that issue, so I say: I am fine with either one. And since 200_success’s solution is simpler, I accepted it as the correct answer.
Advertisement
Answer
As @AplusKminus has explained, re.split()
never splits on an empty pattern match. Therefore, instead of splitting, you should try finding the components you are interested in.
Here is a solution using re.finditer()
that emulates splitting:
def camel_case_split(identifier): matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier) return [m.group(0) for m in matches]