Skip to content
Advertisement

Mapping complex JSON to Pandas Dataframe

Background
I have a complex nested JSON object, which I am trying to unpack into a pandas df in a very specific way.

JSON Object
this is an extract, containing randomized data of the JSON object, which shows examples of the hierarchy (inc. children) for 1x family (i.e. ‘Falconer Family’), however there is 100s of them in total and this extract just has 1x family, however the full JSON object has multiple –

JavaScript

Notes on JSON Object extract

  1. data – data in here can be ignored, these are aggregated values for underlying children.
  2. metacolumns – contains the column header values I want to use for each applicable children ‘column` key:pair values.
  3. groupings – can be ignored.
  4. children hierarchy – there are 4x levels of children which can be identified by their name as follows –
    • Family name (i.e., ‘Falconer Family’)
    • Wealth Bucket name (e.g., ‘Wealth Bucket A’)
    • Asset Class name (e.g., ‘Asset Class A’)
    • Fund name (e.g., ‘HUDJ Trust’)

Target Output
this is an extract of target df structure I am trying to achieve –

portfolio name entity_id Adjusted Value (No Div, USD) Current Quarter TWR (USD) YTD TWR (USD) TWR Audit Note
Falconer Family Falconer Family 23132492.90510712 -0.046732301295604683 -0.046732301295604683 None
Falconer Family Wealth Bucket A 13264448.506587146 -0.045960317420568164 -0.045960317420568164 None
Falconer Family Asset Class A 3337.99 0.000003434094574039648 0.000003434094574039648 None
Falconer Family HUDJ Trust 10604454 3337.99 0.000003434094574039648 0.000003434094574039648 None
Falconer Family Asset Class B 1017004.7192636987 -0.025871339096964152 -0.025871339096964152 None
Falconer Family HUDG Trust 10604454 1017004.7192636987 -0.025871339096964152 -0.025871339096964152 None
Falconer Family Asset Class C 231142.67772000004 -0.030370376329670656 -0.030370376329670656 None
Falconer Family HKDJ Trust 10604454 231142.67772000004 -0.030370376329670656 -0.030370376329670656 None
Falconer Family Asset Class D 9791282.570000006 -0.05382756475465478 -0.05382756475465478 None
Falconer Family HUDW Trust 10604454 9791282.570000006 -0.05382756475465478 -0.05382756475465478 None

Notes on Target Output

  1. Portfolio header – for every row, I would like to map the top-level children name value [family name]. E.g., ‘Falconer Family.
  2. Name header – this should simply be the name value from each respective children.
  3. Entity ID – all 4th level children entity_id value should be mapped to this column.
  4. Data columns – regardless of level, all children have identical time_weighted_return, time-weighted_return2 and value columns which should be mapped respectively.
  5. TWR Audit Note – these children _custom_twr_audit_note_911318 values are currently blank, but will be utilized in the future.

Current Output
My main issue is that you can see that I have only been able to tap into the 1st [Family] and 2nd [Wealth Bucket] children level. This leaves me missing the 3rd [Asset Class] and 4th [Fund] –

portfolio name Adjusted Value (No Div, USD) Current Quarter TWR (USD) YTD TWR (USD) TWR Audit Note)
0 Falconer Family Falconer Family 2.313249e+07 -0.046732 -0.046732 None
1 Falconer Family Wealth Bucket A 1.326445e+07 -0.045960 -0.045960 None
2 Falconer Family Wealth Bucket B 9.868044e+06 -0.047699 -0.047699 None

Current code
This is a function which gets me the correct df formatting, however my main issue is that I haven’t been able to find a solution to returning all children, but rather only the top-level –

JavaScript

Help
In short, I am looking for some advice on how I can tap into the remaining children by enhancing the existing code. Whilst I have taken much time to fully explain my problem, please ask if anything isn’t clear. Please note that the JSON may have multiple families, so the solution / advice offered must observe this

Advertisement

Answer

jsonpath-ng can parse even such a nested json object very easily. You can install this convenient library by the following command:

JavaScript

Code:

JavaScript

Explanation:

Firstly, you can confirm the expected output by the following command:

JavaScript
portfolio name entity_id Adjusted Value (No Div, USD) Current Quarter TWR (USD) YTD TWR (USD) TWR Audit Note
Falconer Family Falconer Family nan 2.31325e+07 -0.0467323 -0.0467323
Falconer Family Wealth Bucket A nan 1.32644e+07 -0.0459603 -0.0459603
Falconer Family Asset Class A nan 3337.99 3.43409e-06 3.43409e-06
Falconer Family HUDJ Trust 1.06045e+07 3337.99 3.43409e-06 3.43409e-06
Falconer Family Asset Class B nan 1.017e+06 -0.0258713 -0.0258713

Subsequently, you can see one of the wonderful features in jsonpath-ng by the following command:

JavaScript
full_path portfolio name
data.attributes.total.children.[0] Falconer Family Falconer Family
data.attributes.total.children.[0].children.[0] Falconer Family Wealth Bucket A
data.attributes.total.children.[0].children.[0].children.[0] Falconer Family Asset Class A
data.attributes.total.children.[0].children.[0].children.[0].children.[0] Falconer Family HUDJ Trust
data.attributes.total.children.[0].children.[0].children.[1] Falconer Family Asset Class B
data.attributes.total.children.[0].children.[0].children.[1].children.[0] Falconer Family HUDG Trust
data.attributes.total.children.[0].children.[0].children.[2] Falconer Family Asset Class C
data.attributes.total.children.[0].children.[0].children.[2].children.[0] Falconer Family HKDJ Trust
data.attributes.total.children.[0].children.[0].children.[3] Falconer Family Asset Class D
data.attributes.total.children.[0].children.[0].children.[3].children.[0] Falconer Family HUDW Trust

Thanks to the full_path column, you can grasp the nesting level of the extracted data in each row instantaneously. Actually, I appended the correct portfolio values by using these paths.

In terms of the code, the key point is the following line:

JavaScript

By the above expression, you can search the children attributes at any level of the json object. README.rst tells you what each syntax stands for.

Syntax Meaning
$ The root object
jsonpath1 .. jsonpath2 All nodes matched by jsonpath2 that descend from any node matching jsonpath1
[*] any array index

Speed:

I compared the speed between the above method with jsonpath-ng and a nested-for-loop method shown below.

# Comparison:

Method Duration Speed ratio
jsonpath-ng 9.72 ms ± 342 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.7 (faster)
Nested-for-loop 55.4 ms ± 7.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 1

# Code of the nested-for-loop method:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement