Skip to content
Advertisement

How to convert string of list of list to list?

I have this file, it is the result of the MapReduce job so it has key-value format:

'nullt[0, [[0, 21], [1, 4], [2, 5]]]n'
'nullt[1, [[0, 3], [1, 1], [2, 2]]]n'

I want to remove all the character except the second element of this value list:

[[0, 21], [1, 4], [2, 5]]
[[0, 3], [1, 1], [2, 2]]

And finally, add each to a single list:

[[[0, 21], [1, 4], [2, 5]], [[0, 3], [1, 1], [2, 2]]]

This is my attempt so far:

with open(FILENAME) as f:
    content = f.readlines()

for line in content:
    # Just match all the chars upto "[[" then replace the matched chars with "["
    clean_line = re.sub(r'^.*?[[', '[', line)
    # And remove "n" and the last 2 "]]" of the string
    clean_line = re.sub('[n]', '', clean_line)[:-2]
    corpus.append(clean_line)

Output:

['[0, 21], [1, 4], [2, 5]', '[0, 3], [1, 1], [2, 2]']

You can see it is still str type, how can I make it to list type?

Advertisement

Answer

Treat it as a line of json and just replace parts of your lines with json documents as needed

import json
corpus = [json.loads(line.replace('nullt', '{"a":').replace("n", "}"))["a"][1] for line in content]

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement