I have a sample JSON file like this
data = { "type": "video", "videoID": "vid001", "links": [ {"type": "video", "videoID": "vid002", "links": []}, {"type": "video", "videoID": "vid003", "links": [ {"type": "video", "videoID": "vid004"}, {"type": "video", "videoID": "vid005"}, ] }, {"type": "video", "videoID": "vid006"}, {"type": "video", "videoID": "vid007", "links": [ {"type": "video", "videoID": "vid008", "links": [ {"type": "video", "videoID": "vid009", "links": [{"type": "video", "videoID": "vid010"}] } ]} ]}, ] }
i need to take only specific key and values from the json file and convert it into a CSV file
Code: Ref:Extracting Specific Keys/Values From A Messed-Up JSON File (Python)
def extract(data, keys): out = [] queue = [data] while len(queue) > 0: current = queue.pop(0) if type(current) == dict: for key in keys: if key in current: out.append({key: current[key]}) for val in current.values(): if type(val) in [list, dict]: queue.append(val) elif type(current) == list: queue.extend(current) return out x = extract(data, ["videoID","type"]) print(pd.DataFrame.from_dict(x))
When I’m passing 2 values through the extract(), getting NaN in between result
videoID type 0 vid001 NaN 1 NaN video 2 vid002 NaN 3 NaN video 4 vid003 NaN 5 NaN video 6 vid006 NaN 7 NaN video 8 vid007 NaN 9 NaN video 10 vid004 NaN 11 NaN video 12 vid005 NaN 13 NaN video 14 vid008 NaN 15 NaN video 16 vid009 NaN 17 NaN video 18 vid010 NaN 19 NaN video
I need to get an output like the below
videoID type 0 vid001 video 1 vid002 video 2 vid003 video 3 vid004 video etc...
and convert it into a csv file, can somebody help me to solve this issue
Advertisement
Answer
Your method seems ok to me. You are just making a mistake in the for key in keys
loop: what you are currently doing is you are creating a single dict ({key: current[key]}
) for every element. So at the end you will have a out
list that is a list of dicts that are not related, in which each videoID
is in a dict and type
is in a different dict.
Like this:
[{'videoID': 'vid001'}, {'type': 'video'}, {'videoID': 'vid002'}, {'type': 'video'}, {'videoID': 'vid003'}, {'type': 'video'}, {'videoID': 'vid006'}, {'type': 'video'}, {'videoID': 'vid007'}, {'type': 'video'}, {'videoID': 'vid004'}, {'type': 'video'}, {'videoID': 'vid005'}, {'type': 'video'}, {'videoID': 'vid008'}, {'type': 'video'}, {'videoID': 'vid009'}, {'type': 'video'}, {'videoID': 'vid010'}, {'type': 'video'}]
What you would want instead is:
[{'videoID': 'vid001', 'type': 'video'}, {'videoID': 'vid002', 'type': 'video'}, {'videoID': 'vid003', 'type': 'video'}, {'videoID': 'vid006', 'type': 'video'}, {'videoID': 'vid007', 'type': 'video'}, {'videoID': 'vid004', 'type': 'video'}, {'videoID': 'vid005', 'type': 'video'}, {'videoID': 'vid008', 'type': 'video'}, {'videoID': 'vid009', 'type': 'video'}, {'videoID': 'vid010', 'type': 'video'}]
where each videoID is related to its type.
To do so you just have to create a dict when looping on keys, add each element to the dict and then append the dict to out
list at the end of the loop on keys.
What I would do is this:
data_couple = {} for key in keys: if key in current: data_couple[key] = current[key] # out.append({key: current[key]}) out.append(data_couple)``
so the whole extract
function would become:
def extract(data, keys): out = [] queue = [data] while len(queue) > 0: current = queue.pop(0) if type(current) == dict: data_couple = {} for key in keys: if key in current: data_couple[key] = current[key] out.append(data_couple) for val in current.values(): if type(val) in [list, dict]: queue.append(val) elif type(current) == list: queue.extend(current) return out
Finally, to write a dict in a csv file I would simply use csv DictWriter:
import csv def writeToCsv(dictionary, col_name): with open("file.csv", "w") as f: wr = csv.DictWriter(f, fieldnames=col_name) wr.writeheader() for elem in dictionary: wr.writerow(elem) writeToCsv(your_dictionary, ["videoID", "type"])
This would create a CSV with videoID
and type
columns
Alternatively, as suggested by Ivan Calderon answer here you could also use pandas to_csv
method in one line:
you should change your dictionary before:
x = extract(data, ["videoID","type"]) d = {k: [v] for k, v in x.items()} pd.DataFrame.from_dict(data=d, orient='columns').to_csv('dict_file.csv')
but I am not sure about this…