Skip to content
Advertisement

Controlling Yaml Serialization Order in Python

How do you control how the order in which PyYaml outputs key/value pairs when serializing a Python dictionary?

I’m using Yaml as a simple serialization format in a Python script. My Yaml serialized objects represent a sort of “document”, so for maximum user-friendliness, I’d like my object’s “name” field to appear first in the file. Of course, since the value returned by my object’s __getstate__ is a dictionary, and Python dictionaries are unordered, the “name” field will be serialized to a random location in the output.

e.g.

>>> import yaml
>>> class Document(object):
...     def __init__(self, name):
...         self.name = name
...         self.otherstuff = 'blah'
...     def __getstate__(self):
...         return self.__dict__.copy()
... 
>>> doc = Document('obj-20111227')
>>> print yaml.dump(doc, indent=4)
!!python/object:__main__.Document
otherstuff: blah
name: obj-20111227

Advertisement

Answer

Took me a few hours of digging through PyYAML docs and tickets, but I eventually discovered this comment that lays out some proof-of-concept code for serializing an OrderedDict as a normal YAML map (but maintaining the order).

e.g. applied to my original code, the solution looks something like:

>>> import yaml
>>> from collections import OrderedDict
>>> def dump_anydict_as_map(anydict):
...     yaml.add_representer(anydict, _represent_dictorder)
... 
>>> def _represent_dictorder( self, data):
...     if isinstance(data, Document):
...         return self.represent_mapping('tag:yaml.org,2002:map', data.__getstate__().items())
...     else:
...         return self.represent_mapping('tag:yaml.org,2002:map', data.items())
... 
>>> class Document(object):
...     def __init__(self, name):
...         self.name = name
...         self.otherstuff = 'blah'
...     def __getstate__(self):
...         d = OrderedDict()
...         d['name'] = self.name
...         d['otherstuff'] = self.otherstuff
...         return d
... 
>>> dump_anydict_as_map(Document)
>>> doc = Document('obj-20111227')
>>> print yaml.dump(doc, indent=4)
!!python/object:__main__.Document
name: obj-20111227
otherstuff: blah
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement