Parsing a JSON string which was loaded from a CSV using Pandas

Question

I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example: After using df = pandas.read_csv('file.csv'), what's the most efficient way to parse and split the stats column into additional columns? After about an hour, the only thing I could come up

Accepted Answer

There is a slightly easier way, but ultimately you&#8217;ll have to call json.loads There is a notion of a converter in pandas.read_csvconverters : dict. optionalDict of functions for converting values in certain columns. Keys can either be integers or column labelsSo first define your custom parser.  In this case the below should work:def CustomParser(data):    import json    j1 = json.loads(data)    return j1In your case you&#8217;ll have something like:df = pandas.read_csv(f1, converters={'stats':CustomParser},header=0)We are telling read_csv to read the data in the standard way, but for the stats column use our custom parsers.  This will make the stats column a dictFrom here, we can use a little hack to directly append these columns in one step with the appropriate column names.  This will only work for regular data (the json object needs to have 3 values or at least missing values need to be handled in our CustomParser)df[sorted(df['stats'][0].keys())] = df['stats'].apply(pandas.Series)On the Left Hand Side, we get the new column names from the keys of the element of the stats column.  Each element in the stats column is a dictionary.  So we are doing a bulk assign.  On the Right Hand Side, we break up the &#8216;stats&#8217; column using apply to make a data frame out of each key/value pair.

Advertisement

Answer