Skip to content
Advertisement

Pyyaml dump does not produce anchors for the same objects

I was experimenting a bit with PyYaml and I wanted to have a reference to a value appearing previously in the yaml. To give an example:

import yaml
a=25
dict_to_dump={'a':a,'b':a}
yaml.dump(dict_to_dump)

from what I understood from the specifications pyyaml should be adding an anchor to each object that has already been encountered. In my case, I would expect to have in the yaml file:

a:&id 25
b:*id

as the objects passed are exactly the same but instead, I find:

a:25
b:25

how can I obtain the desired behaviour?

Advertisement

Answer

First of all your expectation is incorrect. What you could expect is

a: &id 25
b: *id

with a space after the value indicator (:).

You also will need to do yaml.dump(dict_to_dump, sys.stdout) to get any output from your program, and what you indicate is not what you get (it again is missing spaces after the value indicator).


You normally only get an alias if you have two objects a and b with the same value for id(a) and id(b). Simple objects like integers and strings (that are reused from a pool) have the same id() even if assigned in different places in the source. Variable structures like a dict or list, or instances of Python classes do not usually have the same id().

PyYAML does know about this and handles some types of objects different even if the id() is the same.

import sys
import yaml
import datetime

a = 25
b = 25
c = 'some string specified twice in the source'
d = 'some string specified twice in the source'
e = datetime.date(2023, 1, 11)
f = datetime.date(2023, 1, 11)

print('a-b', id(a) == id(b))
print('c-d', id(c) == id(d))
print('e-f', id(e) == id(f))
print('=====')

dict_to_dump = dict(e=e, x=e, f=f)
yaml.dump(dict_to_dump, sys.stdout)

which gives:

a-b True
c-d True
e-f False
=====
e: &id001 2023-01-11
f: 2023-01-11
x: *id001

If you want to get the expected output, you have to make a Python class Int that behaves like an integer. And then when you do a = Int(25) you will get your anchor and alias.

This is what my library ruamel.yaml does, when loading in the default round-trip mode, it also preserves the actual anchor/alias used:

import sys
import ruamel.yaml

yaml_str = """
a: &my_special_id 25
b: *my_special_id
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print(f'{data["a"] * 4  =}')
print(f'{data["b"] + 75 =}')
print('=====')
yaml.dump(data, sys.stdout)

which gives:

data["a"] * 4  =100
data["b"] + 75 =100
=====
a: &my_special_id 25
b: *my_special_id

To create data from scratch is also possible

import sys
import ruamel.yaml

Int = ruamel.yaml.scalarint.ScalarInt

a = Int(25, anchor='id')
data = dict(a=a, b=a)

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which gives what you expected in the first place:

a: &id 25
b: *id
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement