Search and replace specific strings with floating point representations in python

Problem: I’m trying to replace mutiple specific sequences in a string with another mutiple specific sequences of floating point representations in Python.

I have an array of strings in a JSON-file, which I load to a python script via the json-module. The array of strings:

{
  "LinesToReplace": [
    "_ __ ___ ____ _____ ______ _______      ",
    "_._ __._ ___._ ____._ _____._ ______._  ",
    "_._ _.__ _.___ _.____ _._____ _.______  ",
    "_._ __.__ ___.___ ____.____ _____._____ ",
    "_. __. ___. ____. _____. ______.        "
  ]
}

I load the JSON-file via the json-module:

with open("myFile.json") as jsonFile:
  data = json.load(jsonFile)

I’m trying to replace the sequences of _ with specific substrings of floating point representations.

Specification:

Character to find in a string must be a single _ or a sequence of multiple _.
The length of the sequence of _ is unknown.
If a single _ or a sequence of multiple _ is followed by a ., which is again followed by a single _ or a sequence of multiple _, the . is part of the _-sequence.
The . is used to specify decimals
If the . isn’t followed by a single _ or a sequence of multiple _, the . is not part of the _-sequence.
The sequence of _ and . is to be replaced by floating point representations, i.e., %f1.0.
The representations are dependent on the _– and .-sequences.

Examples:

__ is to be replaced by %f2.0.
_.___ is to be replaced by %f1.3.
____.__ is to be replaced by %f4.2.
___. is to be replaced by %3.0.

For the above JSON-file, the result should be:

{
  "ReplacedLines": [
    "%f1.0 %f2.0 %f3.0 %f4.0 %f5.0 %f6.0 %f7.0      ",
    "%f1.1 %f2.1 %f3.1 %f4.1 %f5.1 %f6.1  ",
    "%f1.1 %f1.2 %f1.3 %f1.4 %f1.5 %f1.6  ",
    "%f1.1 %f2.2 %f3.3 %f4.4 %f5.5 ",
    "%f1.0. %f.0. %f3.0. %f4.0. %f5.0. %f6.0.        "
  ]
}

Some code, which tries to replace single _ with %f1.0 (that doesn’t work…):

with open("myFile.json") as jsonFile:
  data = json.load(jsonFile)
  strToFind = "_"
  
  for line in data["LinesToReplace"]:
    for idl, l in enumerate(line):
      if (line[idl] == strToFind and line[idl+1] != ".") and (line[idl+1] != strToFind and line[idl-1] != strToFind):
        l = l[:idl] + "%f1.0" + l[idl+1:] # replace string

Any ideas on how to do this? I have also though about using regular expressions.

EDIT

The algorithm should be able to check if the character is a “_”, i.e. to be able to format this:

{
  "LinesToReplace": [
    "Ex1:_ Ex2:_. Ex3:._ Ex4:_._ Ex5:_._.    ",
    "Ex6:._._ Ex7:._._. Ex8:__._ Ex9: _.__   ",
    "Ex10: _ Ex11: _. Ex12: ._ Ex13: _._     ",
    "Ex5:._._..Ex6:.._._.Ex7:.._._._._._._._."
  ]
}

Solution:

{
  "LinesToReplace": [
    "Ex1:%f1.0 Ex2:%f1.0. Ex3:.%f1.0 Ex4:%f1.1 Ex5:%f1.1.    ",
    "Ex6:.%f1.1 Ex7:.%f1.1. Ex8:%f2.1 Ex9: %f1.2   ",
    "Ex10: %f1.0 Ex11: %f1.0. Ex12: .%f1.0 Ex13: %f1.1     ",
    "Ex5:.%f1.1..Ex6:..%f1.1.Ex7:..%f1.1.%f1.1.%f1.1.%f1.0."
  ]
}

I have tried the following algorithm based on the above criteria, but I can’t figure out how to implement it:

def replaceFunc3(lines: list[str]) -> list[str]:
    result = []
    charToFind = '_'
    charMatrix = []

    # Find indicies of all "_" in lines
    for line in lines:
        charIndices = [idx for idx, c in enumerate(line) if c == charToFind]
        charMatrix.append(charIndices)

    for (line, char) in zip(lines, charMatrix):
        if not char: # No "_" in current line, append the whole line
            result.append(line)
    else:
        pass
        # result.append(Something)
        # TODO: Insert "%fx.x on all the placeholders"

    return result

Answer

You can use regular expression’s re.sub together with a replacement function that performs the logic on the capture groups:

import re

def replace(line):
    return re.sub(
        '(_+)([.]_+)?',
        lambda m: f'%f{len(m.group(1))}.{len(m.group(2) or ".")-1}',
        line,
    )

lines = [replace(line) for line in lines_to_replace]

Explanation of regex:

(_+) matches one or more underscores; the () part makes them available as a capture group (the first such group, i.e. m.group(1)).
([.]_+)? optionally matches a dot followed by one or more trailing underscores (made optional by the trailing ?); the dot is part of a character class ([]) because otherwise it would have the special meaning “any character”. The () make this part available as the second capture group (m.group(2)).

Advertisement

Answer