I have a dataframe which has a column that contains a list of dictionaries. This is what an example column value it looks like:
JavaScript
x
2
1
[{'score': 0.09248554706573486, 'category': 'soccer', 'threshold': 0.13000713288784027}, {'score': 0.09267200529575348, 'category': 'soccer', 'threshold': 0.11795613169670105}, {'score': 0.1703065186738968, 'category': 'soccer', 'threshold': 0.2004493921995163}, {'score': 0.08060390502214432, 'category': 'basketball', 'threshold': 0.09613725543022156}, {'score': 0.16494056582450867, 'category': 'basketball', 'threshold': 0.2284235805273056}, {'score': 0.008428425528109074, 'category': 'basketball', 'threshold': 0.018201233819127083}, {'score': 0.0761604905128479, 'category': 'hockey', 'threshold': 0.0924532413482666}, {'score': 0.10853488743305206, 'category': 'basketball', 'threshold': 0.1252049058675766}, {'score': 0.0012563085183501244, 'category': 'soccer', 'threshold': 0.008611497469246387}, {'score': 0.058744996786117554, 'category': 'soccer', 'threshold': 0.08366610109806061}, {'score': 0.20794744789600372, 'category': 'rugby', 'threshold': 0.26308900117874146}, {'score': 0.1463163197040558, 'category': 'hockey', 'threshold': 0.18053030967712402}, {'score': 0.12938784062862396, 'category': 'hockey', 'threshold': 0.13267497718334198}, {'score': 0.09140244871377945, 'category': 'basketball', 'threshold': 0.13820350170135498}, {'score': 0.06976936012506485, 'category': 'hockey', 'threshold': 0.0989123210310936}, {'score': 0.05813559517264366, 'category': 'basketball', 'threshold': 0.06885409355163574}, {'score': 0.09365707635879517, 'category': 'hockey', 'threshold': 0.12393374741077423},]
2
I want to create a separate dataframe that takes the above column values for each row, and produces a dataframe where ‘category’ is a column, and the values for that columns are score and threshold.
For example:
JavaScript
1
8
1
category | score | threshold
2
soccer | 0.09248554706573486 | 0.13000713288784027
3
soccer | 0.09267200529575348 | 0.13000713288784027
4
soccer | 0.1703065186738968 | 0.13000713288784027
5
basketball | 0.16494056582450867 | 0.018201233819127083
6
basketball | 0.08060390502214432 | 0.018201233819127083
7
basketball | 0.10853488743305206 | 0.018201233819127083
8
Advertisement
Answer
Assuming lst
the input list, simply use the DataFrame
constructor:
JavaScript
1
2
1
df = pd.DataFrame(lst)
2
output:
JavaScript
1
19
19
1
score category threshold
2
0 0.092486 soccer 0.130007
3
1 0.092672 soccer 0.117956
4
2 0.170307 soccer 0.200449
5
3 0.080604 basketball 0.096137
6
4 0.164941 basketball 0.228424
7
5 0.008428 basketball 0.018201
8
6 0.076160 hockey 0.092453
9
7 0.108535 basketball 0.125205
10
8 0.001256 soccer 0.008611
11
9 0.058745 soccer 0.083666
12
10 0.207947 rugby 0.263089
13
11 0.146316 hockey 0.180530
14
12 0.129388 hockey 0.132675
15
13 0.091402 basketball 0.138204
16
14 0.069769 hockey 0.098912
17
15 0.058136 basketball 0.068854
18
16 0.093657 hockey 0.123934
19
If you have such list for each item in the series, use itertools.chain
:
JavaScript
1
3
1
from itertools import chain
2
df2 = pd.DataFrame(chain.from_iterable(df['col']))
3