Given, df:
import pandas as pd import numpy as np data = {'Col1': [1, 2, 3], 'Person': [{'ID': 10001, 'Data': {'Address': {'Street': '1234 Street A', 'City': 'Houston', 'State': 'Texas', 'Zip': '77002'}}, 'Age': 30, 'Income': 50000}, {'ID': 10002, 'Data': {'Address': {'Street': '7892 Street A', 'City': 'Greenville', 'State': 'Maine', 'Zip': np.nan}}, 'Age': np.nan, 'Income': 63000}, {'ID': 10003, 'Data': {'Address': np.nan}, 'Age': 56, 'Income': 85000}]} df = pd.DataFrame(data)
Input Dataframe:
Col1 Person 0 1 {'ID': 10001, 'Data': {'Address': {'Street': '1234 Street A', 'City': 'Houston', 'State': 'Texas', 'Zip': '77002'}}, 'Age': 30, 'Income': 50000} 1 2 {'ID': 10002, 'Data': {'Address': {'Street': '7892 Street A', 'City': 'Greenville', 'State': 'Maine', 'Zip': nan}}, 'Age': nan, 'Income': 63000} 2 3 {'ID': 10003, 'Data': {'Address': nan}, 'Age': 56, 'Income': 85000}
My expected output dataframe is df[['Col1', 'Income', 'Age', 'Street', 'Zip']]
where Income, Age, Street, and Zip come from within Person:
Col1 Income Age Street Zip 0 1 50000 30.0 1234 Street A 77002 1 2 63000 NaN 7892 Street A nan 2 3 85000 56.0 NaN nan
Advertisement
Answer
Using list comprehension, we can create most of these columns.
df['Income'] = [x.get('Income') for x in df['Person']] df['Age'] = [x.get('Age') for x in df['Person']] df['Age']
Output:
0 30.0 1 NaN 2 56.0 Name: Age, dtype: float64
However, dealing with np.nan values inside a nested dictionary is a real pain. Let’s look at getting data from a nested dictionary data where one of the values is nan.
df['Street'] = [x.get('Data').get('Address').get('Street') for x in df['Person']]
We get an AttributeError:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-80-cc2f92bfe95d> in <module> 1 #However, let's look at getting data rom a nested dictionary where one of the values is nan. 2 ----> 3 df['Street'] = [x.get('Data').get('Address').get('Street') for x in df['Person']] 4 5 #We get and AttributeError because NoneType object has no get method <ipython-input-80-cc2f92bfe95d> in <listcomp>(.0) 1 #However, let's look at getting data rom a nested dictionary where one of the values is nan. 2 ----> 3 df['Street'] = [x.get('Data').get('Address').get('Street') for x in df['Person']] 4 5 #We get and AttributeError because NoneType object has no get method AttributeError: 'float' object has no attribute 'get'
Let’s use the .str
accessor with dictionary keys to fetch this data.
There is little documentation in pandas that shows how you can use .str.get
or .str[]
to fetch values from dictionary objects in a dataframe column/pandas series.
df['Street'] = df['Person'].str['Data'].str['Address'].str['Street']
Output:
0 1234 Street A 1 7892 Street A 2 NaN Name: Street, dtype: object
And, likewise with
df['Zip'] = df['Person'].str['Data'].str['Address'].str['Zip']
Leaving us with the columns to build the desired dataframe
df[['Col1', 'Income', 'Age', 'Street', 'Zip']]
from dictionaries.
Output:
Col1 Income Age Street Zip 0 1 50000 30.0 1234 Street A 77002 1 2 63000 NaN 7892 Street A NaN 2 3 85000 56.0 NaN NaN