I recovered a list of ISO 3166-2 countries and regions in this Github repository. I managed to have a first look of the regions using the following code:
import pandas as pd import json data = "/content/data.json" df = pd.read_json(data) df = df.T
Which gives the following output:
name | divisions |
---|---|
Afghanistan | {‘AF-BDS’: ‘Badakhshān’, ‘AF-BDG’: ‘Bādghīs’, ‘AF-BGL’: ‘Baghlān’, ‘AF-BAL’: ‘Balkh’, ‘AF-BAM’: ‘Bāmīān’, ‘AF-FRA’: ‘Farāh’, ‘AF-FYB’: ‘Fāryāb’, ‘AF-GHA’: ‘Ghaznī’, ‘AF-GHO’: ‘Ghowr’, ‘AF-HEL’: ‘Helmand’, ‘AF-HER’: ‘Herāt’, ‘AF-JOW’: ‘Jowzjān’, ‘AF-KAB’: ‘Kabul (Kābol)’, ‘AF-KAN’: ‘Kandahār’, ‘AF-KAP’: ‘Kāpīsā’, ‘AF-KNR’: ‘Konar (Kunar)’, ‘AF-KDZ’: ‘Kondoz (Kunduz)’, ‘AF-LAG’: ‘Laghmān’, ‘AF-LOW’: ‘Lowgar’, ‘AF-NAN’: ‘Nangrahār (Nangarhār)’, ‘AF-NIM’: ‘Nīmrūz’, ‘AF-ORU’: ‘Orūzgān (Urūzgā’, ‘AF-PIA’: ‘Paktīā’, ‘AF-PKA’: ‘Paktīkā’, ‘AF-PAR’: ‘Parwān’, ‘AF-SAM’: ‘Samangān’, ‘AF-SAR’: ‘Sar-e Pol’, ‘AF-TAK’: ‘Takhār’, ‘AF-WAR’: ‘Wardak (Wardag)’, ‘AF-ZAB’: ‘Zābol (Zābul)’} |
Albania | {‘AL-BR’: ‘Berat’, ‘AL-BU’: ‘Bulqizë’, ‘AL-DL’: ‘Delvinë’, ‘AL-DV’: ‘Devoll’, ‘AL-DI’: ‘Dibër’, ‘AL-DR’: ‘Durrës’, ‘AL-EL’: ‘Elbasan’, ‘AL-FR’: ‘Fier’, ‘AL-GR’: ‘Gramsh’, ‘AL-GJ’: ‘Gjirokastër’, ‘AL-HA’: ‘Has’, ‘AL-KA’: ‘Kavajë’, ‘AL-ER’: ‘Kolonjë’, ‘AL-KO’: ‘Korcë’, ‘AL-KR’: ‘Krujë’, ‘AL-KC’: ‘Kucovë’, ‘AL-KU’: ‘Kukës’, ‘AL-LA’: ‘Laç’, ‘AL-LE’: ‘Lezhë’, ‘AL-LB’: ‘Librazhd’, ‘AL-LU’: ‘Lushnjë’, ‘AL-MM’: ‘Malësia e Madhe’, ‘AL-MK’: ‘Mallakastër’, ‘AL-MT’: ‘Mat’, ‘AL-MR’: ‘Mirditë’, ‘AL-PQ’: ‘Peqin’, ‘AL-PR’: ‘Përmet’, ‘AL-PG’: ‘Pogradec’, ‘AL-PU’: ‘Pukë’, ‘AL-SR’: ‘Sarandë’, ‘AL-SK’: ‘Skrapar’, ‘AL-SH’: ‘Shkodër’, ‘AL-TE’: ‘Tepelenë’, ‘AL-TR’: ‘Tiranë’, ‘AL-TP’: ‘Tropojë’, ‘AL-VL’: ‘Vlorë’} |
But I can’t manage to achieve the following output because of the nested JSON.
country code | country name | region code | region name |
---|---|---|---|
AF | Afghanistan | AF-BDS | Badakhshān |
AF | Afghanistan | AF-BDG | Bādghīs |
I tried to loop inside the DataFrame with :
df = json_normalize(df['divisions']).unstack().apply(pd.Series)
But I’m not getting any satisfying result.
Advertisement
Answer
This should work:
df1 = ( pd.DataFrame(data) .transpose() .reset_index(names="country code") .rename(columns={"name": "country name"}) ) divisions = [(k1, v1) for k, v in df1["divisions"].to_dict().items() for k1, v1 in v.items()] df2 = pd.DataFrame(divisions, columns=["region code", "region name"]) final_df = ( pd .merge(df1.explode("divisions"), df2, left_on="divisions", right_on="region code") .drop(columns="divisions") ) print(final_df.head(10)) country code country name region code region name 0 AF Afghanistan AF-BDS Badakhshān 1 AF Afghanistan AF-BDG Bādghīs 2 AF Afghanistan AF-BGL Baghlān 3 AF Afghanistan AF-BAL Balkh 4 AF Afghanistan AF-BAM Bāmīān 5 AF Afghanistan AF-FRA Farāh 6 AF Afghanistan AF-FYB Fāryāb 7 AF Afghanistan AF-GHA Ghaznī 8 AF Afghanistan AF-GHO Ghowr 9 AF Afghanistan AF-HEL Helmand