I am making a request call and storing data into JSON, and from there I am loading JSON to pandas DataFrame, good thing is it works like magic. However, unfortunately, I have deep dictionaries available in a few columns in the data frame. I am unable to extract key values from it. I am attaching the CSV file with a few columns and the important one is the “guest” column.
I have been looking on the internet and have tried so many things that by now I am so confused about what all is correct and incorrect. below is the snapshot of my code and trials.
Adata = response.json() ## Loading the Json Data to DataFrame df = pd.DataFrame(Adata) df = df.astype(str) ## Exporting the Dataframe to csv file. df.to_csv('Appointments.csv') ## Trying to create a new column with key values that I want out of guest column. AB = df[['guest']] print(AB) BA = df['guest'].str.strip().to_frame() print(BA) BA.to_csv('BA_sheet.csv') ##Loaded single row and tried to check if I can do something about it. test = {'id': '4b75bc9a-dc86-4fb5-a80a-46703e3d97b0', 'first_name': 'ASHISH ', 'last_name': 'PATEL', 'gender': 1, 'mobile': {'country_id': 0, 'number': None, 'display_number': None}, 'email': None, 'indicator': '0@0@0@0@0@0@0@x@0@0@0@0@2#0@0@0@0', 'lp_tier_info': '0@x', 'is_virtual_user': False, 'GuestIndicatorValue': {'HighSpender': None, 'Member': 0, 'LowFeedback': None, 'RegularGuest': None, 'FirstTimer': None, 'ReturningCustomer': None, 'NoShow': None, 'HasActivePackages': None, 'HasProfileAlerts': None, 'OtherCenterGuest': None, 'HasCTA': None, 'Dues': None, 'CardOnFile': None, 'AutoPayEnabled': None, 'RecurrenceAppointment': None, 'RebookedAppointment': None, 'hasAddOns': None, 'LpTier': None, 'IsSurpriseVisit': None, 'CustomDataIndicator': None, 'IsGuestBirthday': None}} df3 = pd.DataFrame(test) #print (df3) df3.to_csv('df3_testsheet.csv') ## Trying to lambda function to extract the data that I want. AB = AB.map(lambda x: (x.guest['id'], x.guest['first_name'], x.guest['last_name'])).toDF(['id', 'first_name', 'last_name']) print(AB) ## Trying regex to get the desired data. pp = re.findall(r"'first_name'.*?'(.*?)'", str(AB)) print(pp)
All I want is to extract id
, first_name
and the last_name
from the dictionary from that guest column. Use this link to access the csv file which has the DataFrame result.
Advertisement
Answer
The way you’re doing it, you’re trying to extract your first_name
, last_name
and id
keys from a str representation of a dict. You can convert it back to a dict using the eval
builtin (not recommended if you’re not sure of where the data is coming from), or the ast.literal_eval
function from the ast
module.
import ast df['guest'] = df['guest'].apply(ast.literal_eval)
Once you have the guest dictionaries as dict objects, you can simply apply pd.Series
to convert it to a separate DataFrame
guest_df = df['guest'].apply(pd.Series) guest_df['id'] # => gives you id guest_df['first_name'] # => gives you first name guest_df['last_name'] # => gives you last name