Context
I am using ruamel.yaml
(0.17.21) to automatically inject/update nested objects to a collection of existing YAML documents.
All these documents have a maximum line length of 120 characters, enforced by a linter.
I was expecting to be able to retain this formatting rule by setting the width
attribute on the YAML
instance. However, in practice, unbreakable words such as URLs end up overflowing the 120 characters limit while being dumped back to the output stream.
For example, the following code reformats the input as shown in the diff below, although I didn’t perform any modification to it:
from ruamel.yaml import YAML import sys yaml = YAML() yaml.width = 120 input = yaml.load(''' arn: description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies ''') yaml.dump(input, sys.stdout)
arn: - description: ARN of the Log Group to source data from. The expected format is documented at - https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies + description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
Question
Is there a way I could influence the result of dump()
without implementing my own Emitter
, such as manually verifying that a generated line doesn’t overflow the desired maximum line length, and wrap it myself if that’s the case?
Advertisement
Answer
There is something strange going on with the emitter for the plain scalars, and that is old (inherited) code, so it might take some time to fix (without breaking other things.
I think you can programmatically correct these with the following WrapToLong
class
passed to the transform argument. I use a class here so you don’t need
to use some global variable for getting the width to the routine doing the actual work:
from ruamel.yaml import YAML import sys yaml = YAML() yaml.width = 120 input = yaml.load(''' arn: description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies ''') class WrapToLong: def __init__(self, width, indent=2): self._width = width self._indent = indent def __call__(self, s): res = [] for line in s.splitlines(): if len(line) > self._width and ' ' in line: idx = 0 while line[idx] == ' ': idx += 1 line, rest = line.rsplit(' ', 1) res.append(line) res.append(' ' * (idx + self._indent) + rest) continue res.append(line) return 'n'.join(res) + 'n' yaml.dump(input, sys.stdout, transform=WrapToLong(yaml.width))
which gives:
arn: description: ARN of the Log Group to source data from. The expected format is documented at https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazoncloudwatchlogs.html#amazoncloudwatchlogs-resources-for-iam-policies
You could use folded scalars (using >-
) those keep the newlines where they were in ruamel.yaml
, but you would need to update
all your YAML files (programmatically) and you could not easily update the loaded string if the text before the URL changes, because that can change the positions where the string was folded.