Skip to content
Advertisement

How to remove brackets from multi-value keys when converting to dataframes or extend values of a key without extraneous characters

JavaScript

The above code handles a nested dictionary to dataframe conversion perfectly fine but if you have a nested dictionary created with the .append() or .extend() method it adds extraneous brackets[] and quotes '' which is making downstream analysis difficult.

For example for a nested dictionary like this:

JavaScript

created with the setup:

JavaScript

And converted to a dataframe with pd.dataframe.from_dict()

Creates a table that looks like this:

Columns one Column two
Key1 [‘Value1′,’Value2′,’value3’]
Key2 [‘Value2′,’value4′,’value5’]

here the cells become a single lump of strings and lose a level of data

While something like would be more ideal to preserve a whole level of data:

Columns one Column two
Key1 Value1,Value2,value3
Key2 Value2,value4,value5

It seems the extraneous characters are essential delimiters and can’t be done without when updating keys, so best I can tell that rules out extending the values without brackets or quotes.

What would be more appropriate:

  1. Try to convert to dataframe from dictionary and remove extraneous characters in conversion? If so, how?
  2. Remove brackets and quotes with regex once the dataframe is created?

Advertisement

Answer

One option is to stack the columns, join the strings, then unstack:

JavaScript

But it’s probably more efficient to modify the input dictionary in vanilla Python first and then construct the DataFrame:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement