Hi I am new to python and regex. I have a string which i want to reformat/substitute
JavaScript
x
6
1
string = '1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität
2
3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National
3
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan,
4
5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7
5
University of Colorado School of Medicine/Aurora/United States of America'
6
i did try with:
JavaScript
1
2
1
re.sub('(, d+()?)', r'1=', string).strip()
2
Expected output:
JavaScript
1
6
1
string = '1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität
2
3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National
3
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan,
4
5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland,
5
7=University of Colorado School of Medicine/Aurora/United States of America'
6
Advertisement
Answer
You can match either the start of the string, or a space and comma without using a capture group and assert not a digit after matching a single digit.
JavaScript
1
2
1
(?:^|, )d+(?!/)
2
The pattern matches
(?:^|, )
Non capture group, assert either the start of the string or moatch,
d+(?!/)
Match 1+ digits asserting not a/
directly to the right
In the replacement use the full match followed by an equals sign
JavaScript
1
2
1
g<0>=
2
Example
JavaScript
1
11
11
1
import re
2
3
string = ("1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität n"
4
"3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National n"
5
"Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, n"
6
"5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7 n"
7
"University of Colorado School of Medicine/Aurora/United States of America")
8
9
result = re.sub(r'(?:^|, )d+(?!/)', r'g<0>=', string, 0, re.MULTILINE).strip()
10
print(result)
11
Output
JavaScript
1
6
1
1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität
2
3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National
3
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan,
4
5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7=
5
University of Colorado School of Medicine/Aurora/United States of America
6
Another option could be using a positive lookahead to assert an uppercase char [A-Z]
after matching a digit.
JavaScript
1
2
1
(?:^|, )d+(?=s*[A-Z])
2