Skip to content
Advertisement

substitute ‘=’ sign when an integer is encountered using python

Hi I am new to python and regex. I have a string which i want to reformat/substitute

string = '1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität 
3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National 
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 
5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7 
University of Colorado School of Medicine/Aurora/United States of America'

i did try with:

re.sub('(, d+()?)', r'1=', string).strip()

Expected output:

string = '1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität 
3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National 
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 
5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland, 
7=University of Colorado School of Medicine/Aurora/United States of America'

Advertisement

Answer

You can match either the start of the string, or a space and comma without using a capture group and assert not a digit after matching a single digit.

(?:^|, )d+(?!/)

The pattern matches

  • (?:^|, ) Non capture group, assert either the start of the string or moatch ,
  • d+(?!/) Match 1+ digits asserting not a / directly to the right

Regex demo | Python demo

In the replacement use the full match followed by an equals sign

g<0>=

Example

import re

string = ("1John Radcliffe Hospital/Oxford/United Kingdom, 11Ruhr-Universität n"
          "3/Bochum/Bochum/Germany, 3University of British Columbia/Vancouver/Canada, 4National n"
          "Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, n"
          "5University of Catania/Catania/Italy, 6F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7 n"
          "University of Colorado School of Medicine/Aurora/United States of America")

result = re.sub(r'(?:^|, )d+(?!/)', r'g<0>=', string, 0, re.MULTILINE).strip()
print(result)

Output

1=John Radcliffe Hospital/Oxford/United Kingdom, 11=Ruhr-Universität 
3/Bochum/Bochum/Germany, 3=University of British Columbia/Vancouver/Canada, 4=National 
Institute of Neuroscience, National Center of Neurology and Psychiatry/Tokyo/Japan, 
5=University of Catania/Catania/Italy, 6=F. Hoffmann-La Roche Ltd/Basel/Switzerland, 7= 
University of Colorado School of Medicine/Aurora/United States of America

Another option could be using a positive lookahead to assert an uppercase char [A-Z] after matching a digit.

(?:^|, )d+(?=s*[A-Z])

Regex demo

Advertisement