So, I have a fun issue. I have some data that have a fun nested dictionary that I need to manipulate, but am having trouble. I can do it in pure python, but wanted to do the entire solution in Pandas so as to keep the code a little cleaner and not have to re-open the same files elsewhere.
Dataframe:
Id Timezone Data 957643 Pacific {"California":{"city":"San Francisco","pop":"874961"}} 973472 Eastern {"New York":{"city":"New York","pop":"8419000"}}
Desired output, a list of dictionaries, placing the Id and Timezone into the nested dictionary while wrapping each in another key so I can just output as JSON:
[{"State Date":{"California":{"City":"San Francisco","Population":"874961","Id":"957643","Timezone":"Pacific"}}}, {"State Date":"New York":{"City":"New York","Population":"8419000", "Id":"973472","Timezone":"Eastern"}}}]
The problem is that I need to, for the final data format to be ingested elsewhere, place Timezone and Id into the nested portion of each row to be exported into JSON and rename some fields. I’ve tried the iterrows method and apply with an axis of 1 but it ends up putting all Id’s and timezones in each dictionary, though it does nest them.
A variation of below works in pure python when reading in the whole CSV, but not in Pandas (for reasons that are likely obvious for most). For brevity, these are the rows I’m interested in, so I drop any others.
output = [] entry = {} for id_, time, data in the_states.iterrows(): for state, other in data.items(): entry['Id'] = id_ entry['City'] = data.get('city') entry['Timezone'] = time entry['Population'] = data.get('pop') output.append({'State Data': entry})
Any help would be appreciated.
Advertisement
Answer
One way:
- Create another
list of dict
viato_dict('records')
. zip
anditerate
over both thelist of dict
.Update
the 1st one with the other to get the desiredJSON
.
result = [] for i,j in zip(df.Data,df[['Id', 'Timezone']].to_dict('records')): for key in i.keys(): i[key] = {**i[key], **j} result.append({'state_data': i})
OUTPUT:
[{'state_data': {'California': {'city': 'San Francisco', 'pop': '874961', 'Id': 957643, 'Timezone': 'Pacific'}}}, {'state_data': {'New York': {'city': 'New York', 'pop': '8419000', 'Id': 973472, 'Timezone': 'Eastern'}}}]
Complete Example:
df = pd.DataFrame({'Id': {0: 957643, 1: 973472}, 'Timezone': {0: 'Pacific', 1: 'Eastern'}, 'Data': {0: {"California":{"city":"San Francisco","pop":"874961"}}, 1: {"New York":{"city":"New York","pop":"8419000"}}}}) result = [] for i,j in zip(df.Data,df[['Id', 'Timezone']].to_dict('records')): for key in i.keys(): i[key] = {**i[key], **j} result.append({'state_data': i})