Skip to content
Advertisement

Search and replace specific strings with floating point representations in python

Problem: I’m trying to replace mutiple specific sequences in a string with another mutiple specific sequences of floating point representations in Python.

I have an array of strings in a JSON-file, which I load to a python script via the json-module. The array of strings:

JavaScript

I load the JSON-file via the json-module:

JavaScript

I’m trying to replace the sequences of _ with specific substrings of floating point representations.

Specification:

  • Character to find in a string must be a single _ or a sequence of multiple _.
  • The length of the sequence of _ is unknown.
  • If a single _ or a sequence of multiple _ is followed by a ., which is again followed by a single _ or a sequence of multiple _, the . is part of the _-sequence.
  • The . is used to specify decimals
  • If the . isn’t followed by a single _ or a sequence of multiple _, the . is not part of the _-sequence.
  • The sequence of _ and . is to be replaced by floating point representations, i.e., %f1.0.
  • The representations are dependent on the _– and .-sequences.

Examples:

  • __ is to be replaced by %f2.0.
  • _.___ is to be replaced by %f1.3.
  • ____.__ is to be replaced by %f4.2.
  • ___. is to be replaced by %3.0.

For the above JSON-file, the result should be:

JavaScript

Some code, which tries to replace single _ with %f1.0 (that doesn’t work…):

JavaScript

Any ideas on how to do this? I have also though about using regular expressions.

EDIT

The algorithm should be able to check if the character is a “_”, i.e. to be able to format this:

JavaScript

Solution:

JavaScript

I have tried the following algorithm based on the above criteria, but I can’t figure out how to implement it:

JavaScript

Advertisement

Answer

You can use regular expression’s re.sub together with a replacement function that performs the logic on the capture groups:

JavaScript

Explanation of regex:

  • (_+) matches one or more underscores; the () part makes them available as a capture group (the first such group, i.e. m.group(1)).
  • ([.]_+)? optionally matches a dot followed by one or more trailing underscores (made optional by the trailing ?); the dot is part of a character class ([]) because otherwise it would have the special meaning “any character”. The () make this part available as the second capture group (m.group(2)).
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement