Pandas – expending several values to new columns with some column name manipulation

Question

I'm new to pandas. Consider you have a state in which you have a pandas Dataframe structure of columns like below: user_id | timestamp | foo_name1 | foo_name2 | foo_name3 As we can see Dataframe has several metadata parameters, having raw string values: user_id, timestamp and several dynamic name columns - which have a string value of json within each:

Accepted Answer

I synthesised the structure you defineduse pd.concat(axis=1) and pd.json_normalize() gets you to your answeran additional use of a dict comprehension to name columns as per your requirementdf = pd.DataFrame([{**{"userid": random.randint(1,100),  "timestamp":dt.datetime(2020,8,9,random.randint(0,23))}, **{f"foo_name{f+1}":{f"foo_att{ff+1}":random.randint(1,10) for ff in range(2)} for f in range(3)}} for r in range(5)])df2 = pd.concat([    df.loc[:,["userid", "timestamp"]]] + # the fixed columns    [        pd.DataFrame({f"{c}.{k}":v # rename columns as per requirement                      for k,v in                       # json_normalize is doing all the work,  but requirement to rename columns....                      pd.json_normalize(df[c]).to_dict(orient="list").items()})         # in dict comprehension loop through all foo columns to turn them from dicts to columns        for c in df.columns if "foo" in c], axis=1)print(f"{df.to_string(index=False)}nn{df2.to_string(index=False)}")output userid           timestamp                       foo_name1                        foo_name2                        foo_name3     94 2020-08-09 15:00:00  {'foo_att1': 1, 'foo_att2': 7}   {'foo_att1': 5, 'foo_att2': 2}   {'foo_att1': 5, 'foo_att2': 9}     61 2020-08-09 05:00:00  {'foo_att1': 2, 'foo_att2': 3}  {'foo_att1': 10, 'foo_att2': 9}  {'foo_att1': 10, 'foo_att2': 4}     69 2020-08-09 12:00:00  {'foo_att1': 9, 'foo_att2': 6}   {'foo_att1': 7, 'foo_att2': 8}   {'foo_att1': 5, 'foo_att2': 9}     16 2020-08-09 13:00:00  {'foo_att1': 2, 'foo_att2': 1}   {'foo_att1': 7, 'foo_att2': 8}   {'foo_att1': 9, 'foo_att2': 1}     51 2020-08-09 11:00:00  {'foo_att1': 3, 'foo_att2': 1}   {'foo_att1': 7, 'foo_att2': 1}   {'foo_att1': 6, 'foo_att2': 3} userid           timestamp  foo_name1.foo_att1  foo_name1.foo_att2  foo_name2.foo_att1  foo_name2.foo_att2  foo_name3.foo_att1  foo_name3.foo_att2     94 2020-08-09 15:00:00                   1                   7                   5                   2                   5                   9     61 2020-08-09 05:00:00                   2                   3                  10                   9                  10                   4     69 2020-08-09 12:00:00                   9                   6                   7                   8                   5                   9     16 2020-08-09 13:00:00                   2                   1                   7                   8                   9                   1     51 2020-08-09 11:00:00                   3                   1                   7                   1                   6                   3supplementary &#8211; pick columns that are candidatesBetter approach to picking columns to be normalised.df = pd.DataFrame([{**{"userid": random.randint(1,100),  "timestamp":dt.datetime(2020,8,9,random.randint(0,23))}, **{f"foo_name{f+1}":{f"foo_att{ff+1}":random.randint(1,10) for ff in range(2)} for f in range(3)}, **{"foo_namex":np.nan}, **{"foo_namey":"hello"}} for r in range(5)])# which columns do we want to convert?convert = [c[0]         for c in df.dtypes.items()         if "foo" in c[0]            and c[1].name=="object"            and isinstance(df.loc[0,c[0]], dict)]df2 = pd.concat([    df.loc[:,[c for c in df.columns if c not in convert]]] + # keep the columns not being converted    [        pd.DataFrame({f"{c}.{k}":v # rename columns as per requirement                      for k,v in                       # json_normalize is doing all the work,  but requirement to rename columns....                      pd.json_normalize(df[c]).to_dict(orient="list").items()})         # already worked out list of columns to convert        for c in convert], axis=1)

Pandas – expending several values to new columns with some column name manipulation

Advertisement

Answer

supplementary – pick columns that are candidates