Column dictionary values into separate Dataframe

I have a dataframe which has a column that contains a list of dictionaries. This is what an example column value it looks like:

[{'score': 0.09248554706573486, 'category': 'soccer', 'threshold': 0.13000713288784027}, {'score': 0.09267200529575348, 'category': 'soccer', 'threshold': 0.11795613169670105}, {'score': 0.1703065186738968, 'category': 'soccer', 'threshold': 0.2004493921995163}, {'score': 0.08060390502214432, 'category': 'basketball', 'threshold': 0.09613725543022156}, {'score': 0.16494056582450867, 'category': 'basketball', 'threshold': 0.2284235805273056}, {'score': 0.008428425528109074, 'category': 'basketball', 'threshold': 0.018201233819127083}, {'score': 0.0761604905128479, 'category': 'hockey', 'threshold': 0.0924532413482666}, {'score': 0.10853488743305206, 'category': 'basketball', 'threshold': 0.1252049058675766}, {'score': 0.0012563085183501244, 'category': 'soccer', 'threshold': 0.008611497469246387}, {'score': 0.058744996786117554, 'category': 'soccer', 'threshold': 0.08366610109806061}, {'score': 0.20794744789600372, 'category': 'rugby', 'threshold': 0.26308900117874146}, {'score': 0.1463163197040558, 'category': 'hockey', 'threshold': 0.18053030967712402}, {'score': 0.12938784062862396, 'category': 'hockey', 'threshold': 0.13267497718334198}, {'score': 0.09140244871377945, 'category': 'basketball', 'threshold': 0.13820350170135498}, {'score': 0.06976936012506485, 'category': 'hockey', 'threshold': 0.0989123210310936}, {'score': 0.05813559517264366, 'category': 'basketball', 'threshold': 0.06885409355163574}, {'score': 0.09365707635879517, 'category': 'hockey', 'threshold': 0.12393374741077423},]

JavaScript
​x
 
[{'score': 0.09248554706573486, 'category': 'soccer', 'threshold': 0.13000713288784027}, {'score': 0.09267200529575348, 'category': 'soccer', 'threshold': 0.11795613169670105}, {'score': 0.1703065186738968, 'category': 'soccer', 'threshold': 0.2004493921995163}, {'score': 0.08060390502214432, 'category': 'basketball', 'threshold': 0.09613725543022156}, {'score': 0.16494056582450867, 'category': 'basketball', 'threshold': 0.2284235805273056}, {'score': 0.008428425528109074, 'category': 'basketball', 'threshold': 0.018201233819127083}, {'score': 0.0761604905128479, 'category': 'hockey', 'threshold': 0.0924532413482666}, {'score': 0.10853488743305206, 'category': 'basketball', 'threshold': 0.1252049058675766}, {'score': 0.0012563085183501244, 'category': 'soccer', 'threshold': 0.008611497469246387}, {'score': 0.058744996786117554, 'category': 'soccer', 'threshold': 0.08366610109806061}, {'score': 0.20794744789600372, 'category': 'rugby', 'threshold': 0.26308900117874146}, {'score': 0.1463163197040558, 'category': 'hockey', 'threshold': 0.18053030967712402}, {'score': 0.12938784062862396, 'category': 'hockey', 'threshold': 0.13267497718334198}, {'score': 0.09140244871377945, 'category': 'basketball', 'threshold': 0.13820350170135498}, {'score': 0.06976936012506485, 'category': 'hockey', 'threshold': 0.0989123210310936}, {'score': 0.05813559517264366, 'category': 'basketball', 'threshold': 0.06885409355163574}, {'score': 0.09365707635879517, 'category': 'hockey', 'threshold': 0.12393374741077423},]
​

I want to create a separate dataframe that takes the above column values for each row, and produces a dataframe where ‘category’ is a column, and the values for that columns are score and threshold.

For example:

category | score                  | threshold
soccer   | 0.09248554706573486    | 0.13000713288784027
soccer   | 0.09267200529575348    | 0.13000713288784027
soccer   | 0.1703065186738968     | 0.13000713288784027
basketball  | 0.16494056582450867   | 0.018201233819127083
basketball  | 0.08060390502214432   | 0.018201233819127083
basketball  | 0.10853488743305206   | 0.018201233819127083

JavaScript
 
category | score                  | threshold
soccer   | 0.09248554706573486    | 0.13000713288784027
soccer   | 0.09267200529575348    | 0.13000713288784027
soccer   | 0.1703065186738968     | 0.13000713288784027
basketball  | 0.16494056582450867   | 0.018201233819127083
basketball  | 0.08060390502214432   | 0.018201233819127083
basketball  | 0.10853488743305206   | 0.018201233819127083
​

Answer

Assuming lst the input list, simply use the DataFrame constructor:

df = pd.DataFrame(lst)

JavaScript
 
df = pd.DataFrame(lst)
​

output:

       score    category  threshold
0   0.092486      soccer   0.130007
1   0.092672      soccer   0.117956
2   0.170307      soccer   0.200449
3   0.080604  basketball   0.096137
4   0.164941  basketball   0.228424
5   0.008428  basketball   0.018201
6   0.076160      hockey   0.092453
7   0.108535  basketball   0.125205
8   0.001256      soccer   0.008611
9   0.058745      soccer   0.083666
10  0.207947       rugby   0.263089
11  0.146316      hockey   0.180530
12  0.129388      hockey   0.132675
13  0.091402  basketball   0.138204
14  0.069769      hockey   0.098912
15  0.058136  basketball   0.068854
16  0.093657      hockey   0.123934

JavaScript
 
       score    category  threshold
 0.092486      soccer   0.130007
 0.092672      soccer   0.117956
 0.170307      soccer   0.200449
 0.080604  basketball   0.096137
 0.164941  basketball   0.228424
 0.008428  basketball   0.018201
 0.076160      hockey   0.092453
 0.108535  basketball   0.125205
 0.001256      soccer   0.008611
 0.058745      soccer   0.083666
0.207947       rugby   0.263089
0.146316      hockey   0.180530
0.129388      hockey   0.132675
0.091402  basketball   0.138204
0.069769      hockey   0.098912
0.058136  basketball   0.068854
0.093657      hockey   0.123934
​

If you have such list for each item in the series, use itertools.chain:

from itertools import chain
df2 = pd.DataFrame(chain.from_iterable(df['col']))

JavaScript
 
from itertools import chain
df2 = pd.DataFrame(chain.from_iterable(df['col']))
​

Advertisement

Answer