Hi I am new to python and regex. I have a string which i want to reformat/substitute
string = '1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität 3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7 University of Colorado School of Medicine/Aurora/United States of America'
i did try with:
re.sub('(, d+()?)', r'1=', string).strip()
Expected output:
string = '1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität 3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7=University of Colorado School of Medicine/Aurora/United States of America'
Advertisement
Answer
You can match either the start of the string, or a space and comma without using a capture group and assert not a digit after matching a single digit.
(?:^|, )d+(?!/)
The pattern matches
(?:^|, )
Non capture group, assert either the start of the string or moatch,
d+(?!/)
Match 1+ digits asserting not a/
directly to the right
In the replacement use the full match followed by an equals sign
g<0>=
Example
import re string = ("1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität n" "3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National n" "Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, n" "5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7 n" "University of Colorado School of Medicine/Aurora/United States of America") result = re.sub(r'(?:^|, )d+(?!/)', r'g<0>=', string, 0, re.MULTILINE).strip() print(result)
Output
1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität 3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7= University of Colorado School of Medicine/Aurora/United States of America
Another option could be using a positive lookahead to assert an uppercase char [A-Z]
after matching a digit.
(?:^|, )d+(?=s*[A-Z])