Skip to content
Advertisement

How to extract key value from deep dictionary in pandas || Python || dataframe

I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic. However, unfortunately, I have deep dictionaries available in a few columns in the data frame. I am unable to extract key values from it. I am attaching the CSV file with a few columns and the important one is the “guest” column.

I have been looking on the internet and have tried so many things that by now I am so confused about what all is correct and incorrect. below is the snapshot of my code and trials.

Adata = response.json()

## Loading the Json Data to DataFrame
df = pd.DataFrame(Adata)
df = df.astype(str)

## Exporting the Dataframe to csv file.
df.to_csv('Appointments.csv')

## Trying to create a new column with key values that I want out of guest column.
AB = df[['guest']]
print(AB)

BA = df['guest'].str.strip().to_frame()
print(BA)

BA.to_csv('BA_sheet.csv')

##Loaded single row and tried to check if I can do something about it.
test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}}
df3 = pd.DataFrame(test)
#print (df3)
df3.to_csv('df3_testsheet.csv')


## Trying to lambda function to extract the data that I want.
AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name'])
print(AB)

## Trying regex to get the desired data.
pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB))
print(pp)

All I want is to extract id, first_name and the last_namefrom the dictionary from that guest column. Use this link to access the csv file which has the DataFrame result.

Advertisement

Answer

The way you’re doing it, you’re trying to extract your first_name, last_name and id keys from a str representation of a dict. You can convert it back to a dict using the eval builtin (not recommended if you’re not sure of where the data is coming from), or the ast.literal_eval function from the ast module.

import ast

df['guest'] = df['guest'].apply(ast.literal_eval)

Once you have the guest dictionaries as dict objects, you can simply apply pd.Series to convert it to a separate DataFrame

guest_df = df['guest'].apply(pd.Series)

guest_df['id'] # => gives you id
guest_df['first_name'] # => gives you first name
guest_df['last_name'] # => gives you last name
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement