Skip to content
Advertisement

Parsing a JSON string which was loaded from a CSV using Pandas

I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example:

JavaScript

After using df = pandas.read_csv('file.csv'), what’s the most efficient way to parse and split the stats column into additional columns?

After about an hour, the only thing I could come up with was:

JavaScript

This seems like I’m doing it wrong, and it’s quite a bit of work considering I’ll need to do this on three columns regularly.

Desired output is the dataframe object below. Added following lines of code to get there in my (crappy) way:

JavaScript

Advertisement

Answer

There is a slightly easier way, but ultimately you’ll have to call json.loads There is a notion of a converter in pandas.read_csv

JavaScript

So first define your custom parser. In this case the below should work:

JavaScript

In your case you’ll have something like:

JavaScript

We are telling read_csv to read the data in the standard way, but for the stats column use our custom parsers. This will make the stats column a dict

From here, we can use a little hack to directly append these columns in one step with the appropriate column names. This will only work for regular data (the json object needs to have 3 values or at least missing values need to be handled in our CustomParser)

JavaScript

On the Left Hand Side, we get the new column names from the keys of the element of the stats column. Each element in the stats column is a dictionary. So we are doing a bulk assign. On the Right Hand Side, we break up the ‘stats’ column using apply to make a data frame out of each key/value pair.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement