I was experimenting a bit with PyYaml and I wanted to have a reference to a value appearing previously in the yaml. To give an example:
import yaml a=25 dict_to_dump={'a':a,'b':a} yaml.dump(dict_to_dump)
from what I understood from the specifications pyyaml should be adding an anchor to each object that has already been encountered. In my case, I would expect to have in the yaml file:
a:&id 25 b:*id
as the objects passed are exactly the same but instead, I find:
a:25 b:25
how can I obtain the desired behaviour?
Advertisement
Answer
First of all your expectation is incorrect. What you could expect is
a: &id 25 b: *id
with a space after the value indicator (:
).
You also will need to do yaml.dump(dict_to_dump, sys.stdout)
to get any output from your program, and what
you indicate is not what you get (it again is missing spaces after the value indicator).
You normally only get an alias if you have two objects a
and b
with the same value for id(a)
and id(b)
.
Simple objects like integers and strings (that are reused from a pool) have the same id()
even if assigned
in different places in the source. Variable structures like a dict
or list
, or instances of Python classes
do not usually have the same id()
.
PyYAML does know about this and handles some types of objects different even if the id()
is the same.
import sys import yaml import datetime a = 25 b = 25 c = 'some string specified twice in the source' d = 'some string specified twice in the source' e = datetime.date(2023, 1, 11) f = datetime.date(2023, 1, 11) print('a-b', id(a) == id(b)) print('c-d', id(c) == id(d)) print('e-f', id(e) == id(f)) print('=====') dict_to_dump = dict(e=e, x=e, f=f) yaml.dump(dict_to_dump, sys.stdout)
which gives:
a-b True c-d True e-f False ===== e: &id001 2023-01-11 f: 2023-01-11 x: *id001
If you want to get the expected output, you have to make a Python class Int
that behaves like an integer.
And then when you do a = Int(25)
you will get your anchor and alias.
This is what my library ruamel.yaml
does, when loading in the default round-trip mode, it also preserves the
actual anchor/alias used:
import sys import ruamel.yaml yaml_str = """ a: &my_special_id 25 b: *my_special_id """ yaml = ruamel.yaml.YAML() data = yaml.load(yaml_str) print(f'{data["a"] * 4 =}') print(f'{data["b"] + 75 =}') print('=====') yaml.dump(data, sys.stdout)
which gives:
data["a"] * 4 =100 data["b"] + 75 =100 ===== a: &my_special_id 25 b: *my_special_id
To create data
from scratch is also possible
import sys import ruamel.yaml Int = ruamel.yaml.scalarint.ScalarInt a = Int(25, anchor='id') data = dict(a=a, b=a) yaml = ruamel.yaml.YAML() yaml.dump(data, sys.stdout)
which gives what you expected in the first place:
a: &id 25 b: *id