I have a dataframe which has lots of json datas inside.
for example :
{"serial": "000000001fb105ea", "sensorType": "acceleration", "data": [1603261123.328814, 0.171875, -0.9609375, 0.0234375]} {"serial": "000000001fb105ea", "sensorType": "acceleration", "data": [1603261125.0605137, 0.0859375, -0.984375, 0.0]} {"serial": "000000001fb105ea", "sensorType": "strain", "data": [1603261126.3532753, 0.9649793604217437]} {"serial": "000000001fb105ea", "sensorType": "acceleration", "data": [1603261127.6988888, 0.0390625, -1.0, 0.125]} {"serial": "000000001fb105ea", "sensorType": "acceleration", "data": [1603261128.8530502, 0.078125, -0.9921875, 0.0]}
There are two types of data.Strain sensor and acceleration sensor.
I want to parse these json datas and convert to normal form. I just need data part of json objects.At result I should have 4 columns for every values in Data.
Date: 21.20.2020:09:18:46 x:0.171875 y:-0.9609375 z:0.0234375
I tried json_normalize but I got this error.
AttributeError: 'str' object has no attribute 'itervalues'
How to parse data part to 4 column dataframe ?
thanks.
Advertisement
Answer
If input data are in json
file use:
cols = ['Date','x','y','z'] df = pd.DataFrame(pd.read_json('json.json', lines=True)['data'].tolist(), columns=cols) df['Date'] = pd.to_datetime(df['Date'], unit='s') print (df) Date x y z 0 2020-10-21 06:18:43.328814030 0.171875 -0.960938 0.023438 1 2020-10-21 06:18:45.060513735 0.085938 -0.984375 0.000000 2 2020-10-21 06:18:46.353275299 0.964979 NaN NaN 3 2020-10-21 06:18:47.698888779 0.039062 -1.000000 0.125000 4 2020-10-21 06:18:48.853050232 0.078125 -0.992188 0.000000
If input is DataFrame
with column col
:
cols = ['Date','x','y','z'] df = pd.DataFrame(pd.json_normalize(df['col'])['data'].tolist(), columns=cols) df['Date'] = pd.to_datetime(df['Date'], unit='s') print (df) Date x y z 0 2020-10-21 06:18:43.328814030 0.171875 -0.960938 0.023438 1 2020-10-21 06:18:45.060513735 0.085938 -0.984375 0.000000 2 2020-10-21 06:18:46.353275299 0.964979 NaN NaN 3 2020-10-21 06:18:47.698888779 0.039062 -1.000000 0.125000 4 2020-10-21 06:18:48.853050232 0.078125 -0.992188 0.000000
EDIT:
Personally save csv like .xls
is not good idea, because then read_excel
raise weird error, but you can use:
import ast df = pd.read_csv('15-10-2020-OO.xls') cols = ['Date','x','y','z'] data = [x['data'] for x in df['Data'].apply(ast.literal_eval)] df = pd.DataFrame(data, columns=cols) df['Date'] = pd.to_datetime(df['Date'], unit='s') print (df) Date x y z 0 2020-10-15 07:21:16.159236193 0.085938 -0.972656 0.003906 1 2020-10-15 07:21:17.597931385 0.089844 -0.968750 0.003906 2 2020-10-15 07:21:18.838171959 0.089844 -0.972656 0.003906 3 2020-10-15 07:21:20.338105917 0.085938 -0.972656 0.003906 4 2020-10-15 07:21:21.768864155 0.089844 -0.984375 0.003906 ... ... ... ... 8457 2020-10-15 08:59:57.907007933 0.085938 -0.972656 0.003906 8458 2020-10-15 08:59:58.371274233 0.089844 -0.976562 0.003906 8459 2020-10-15 08:59:58.833237648 0.085938 -0.976562 0.003906 8460 2020-10-15 08:59:59.313337088 1.517057 NaN NaN 8461 2020-10-15 08:59:59.863240004 0.089844 -0.968750 0.007812 [8462 rows x 4 columns]