Problem: I’m trying to replace mutiple specific sequences in a string with another mutiple specific sequences of floating point representations in Python.
I have an array of strings in a JSON-file, which I load to a python script via the json-module. The array of strings:
{
"LinesToReplace": [
"_ __ ___ ____ _____ ______ _______ ",
"_._ __._ ___._ ____._ _____._ ______._ ",
"_._ _.__ _.___ _.____ _._____ _.______ ",
"_._ __.__ ___.___ ____.____ _____._____ ",
"_. __. ___. ____. _____. ______. "
]
}
I load the JSON-file via the json-module:
with open("myFile.json") as jsonFile:
data = json.load(jsonFile)
I’m trying to replace the sequences of _
with specific substrings of floating point representations.
Specification:
- Character to find in a string must be a single
_
or a sequence of multiple_
. - The length of the sequence of
_
is unknown. - If a single
_
or a sequence of multiple_
is followed by a.
, which is again followed by a single_
or a sequence of multiple_
, the.
is part of the_
-sequence. - The
.
is used to specify decimals - If the
.
isn’t followed by a single_
or a sequence of multiple_
, the.
is not part of the_
-sequence. - The sequence of
_
and.
is to be replaced by floating point representations, i.e.,%f1.0
. - The representations are dependent on the
_
– and.
-sequences.
Examples:
__
is to be replaced by%f2.0
._.___
is to be replaced by%f1.3
.____.__
is to be replaced by%f4.2
.___.
is to be replaced by%3.0
.
For the above JSON-file, the result should be:
{
"ReplacedLines": [
"%f1.0 %f2.0 %f3.0 %f4.0 %f5.0 %f6.0 %f7.0 ",
"%f1.1 %f2.1 %f3.1 %f4.1 %f5.1 %f6.1 ",
"%f1.1 %f1.2 %f1.3 %f1.4 %f1.5 %f1.6 ",
"%f1.1 %f2.2 %f3.3 %f4.4 %f5.5 ",
"%f1.0. %f.0. %f3.0. %f4.0. %f5.0. %f6.0. "
]
}
Some code, which tries to replace single _
with %f1.0
(that doesn’t work…):
with open("myFile.json") as jsonFile:
data = json.load(jsonFile)
strToFind = "_"
for line in data["LinesToReplace"]:
for idl, l in enumerate(line):
if (line[idl] == strToFind and line[idl+1] != ".") and (line[idl+1] != strToFind and line[idl-1] != strToFind):
l = l[:idl] + "%f1.0" + l[idl+1:] # replace string
Any ideas on how to do this? I have also though about using regular expressions.
EDIT
The algorithm should be able to check if the character is a “_”, i.e. to be able to format this:
{
"LinesToReplace": [
"Ex1:_ Ex2:_. Ex3:._ Ex4:_._ Ex5:_._. ",
"Ex6:._._ Ex7:._._. Ex8:__._ Ex9: _.__ ",
"Ex10: _ Ex11: _. Ex12: ._ Ex13: _._ ",
"Ex5:._._..Ex6:.._._.Ex7:.._._._._._._._."
]
}
Solution:
{
"LinesToReplace": [
"Ex1:%f1.0 Ex2:%f1.0. Ex3:.%f1.0 Ex4:%f1.1 Ex5:%f1.1. ",
"Ex6:.%f1.1 Ex7:.%f1.1. Ex8:%f2.1 Ex9: %f1.2 ",
"Ex10: %f1.0 Ex11: %f1.0. Ex12: .%f1.0 Ex13: %f1.1 ",
"Ex5:.%f1.1..Ex6:..%f1.1.Ex7:..%f1.1.%f1.1.%f1.1.%f1.0."
]
}
I have tried the following algorithm based on the above criteria, but I can’t figure out how to implement it:
def replaceFunc3(lines: list[str]) -> list[str]:
result = []
charToFind = '_'
charMatrix = []
# Find indicies of all "_" in lines
for line in lines:
charIndices = [idx for idx, c in enumerate(line) if c == charToFind]
charMatrix.append(charIndices)
for (line, char) in zip(lines, charMatrix):
if not char: # No "_" in current line, append the whole line
result.append(line)
else:
pass
# result.append(Something)
# TODO: Insert "%fx.x on all the placeholders"
return result
Advertisement
Answer
You can use regular expression’s re.sub
together with a replacement function that performs the logic on the capture groups:
import re
def replace(line):
return re.sub(
'(_+)([.]_+)?',
lambda m: f'%f{len(m.group(1))}.{len(m.group(2) or ".")-1}',
line,
)
lines = [replace(line) for line in lines_to_replace]
Explanation of regex:
(_+)
matches one or more underscores; the()
part makes them available as a capture group (the first such group, i.e.m.group(1)
).([.]_+)?
optionally matches a dot followed by one or more trailing underscores (made optional by the trailing?
); the dot is part of a character class ([]
) because otherwise it would have the special meaning “any character”. The()
make this part available as the second capture group (m.group(2)
).