Skip to content
Advertisement

Python save arbitrarily nested list to CSV

I have a list that is composed of strings, integers, and floats, and nested lists of strings, integer, and floats. Here is an example

data = [
        1.0,
        'One',
        [1, 'Two'],
        [1, 'Two', ['Three', 4.5]],
        ['One', 2, [3.4, ['Five', 6]]]
    ]

I want each item of the list written to a line in a CSV file. So, given the above data, the file would look like this:

1.0
One
1,Two
1,Two,Three,4.5
One,2,3.4,Five,6

There are lots of resources about how to write a list to a file, but i have not seen any that do so independently of the nestedness of the list. I’m sure i could come up with something involving many loops, etc, but does anyone have a more elegant solution?

EDIT: The best thing i have come up with is to convert each item in the list to a string, then remove the extra characters (“[“, “]”, etc). Then you attach the item strings, and write the result to a file:

string = ''
for i in data:
    line = str(i).replace("[","")
    line = line.replace("]","")
    line = line.replace("'","")
    line = line.replace(" ","")
    string+=line + 'n'

# write string to file...

This just feels kludgey, and it is potentially harmful as it assumes the strings do not contain the brackets, quotes, or spaces. I’m looking for a better solution!

Advertisement

Answer

What you ask is more-or-less impossible.

CSV is a flat, tabular storage format. The hierarchical nature of “arbitrarily nested lists” simply do not match or fit into a tabular structure well.

You can definitely flatten the nested list so that each first-level element of your nested list will appear on a single line of the output file. But that isn’t CSV, strictly speaking. Some CSV readers may correctly read the data, but others will not. And, once flattened as in your example, you can never reconstruct the original list by reading the file.

Demonstration:

[1, ["Two", "Three"], 4.0]

and

[1, ["Two", ["Three"]], 4.0]

both will emit:

1
Two,Three
4.0

So on reading that file, the reader/parser won’t know which of the original lists to return–the first, two-level list, or the second, three-level list. (I can make that counter-example arbitrarily complex and ugly.)

In general, nested / hierarchical structures and flat / tabular structures are just not easily or completely compatible.

If you want an easy storage format for an arbitrarily nested list, consider JSON or YAML. They provide easy, high-quality storage for nested data. E.g.:

import json

outpath = 'out.json'
with open(outpath, "w") as f:
    f.write(json.dumps(data))

would write your data to a file. To read it back in:

data = json.load(open(out path))

But if you really want CSV-ish text:

def flatten(l):
    """
    Flatten a nested list.
    """
    for i in l:
        if isinstance(i, (list, tuple)):
            for j in flatten(i):
                yield j
        else:
            yield i

def list2csv(l):
    """
    Return CSV-ish text for a nested list.
    """
    lines = []
    for row in l:
        if isinstance(row, (list, tuple)):
            lines.append(",".join(str(i) for i in flatten(row)))
        else:
            lines.append(str(row))
    return "n".join(lines)

print list2csv(data)

Yields:

1.0
One
1,Two
1,Two,Three,4.5
One,2,3.4,Five,6
Advertisement