Skip to content
Advertisement

Pandas read multiindexed csv with blanks

I’m struggling with properly loading a csv that has a multi lines header with blanks. The CSV looks like this:

JavaScript

CSV Header

What I would like to get is:

Desired Pandas Header

When I try to load with pd.read_csv(file, header=[0,1], sep=','), I end up with the following:

Incorrect result

Is there a way to get the desired result?


Note: alternatively, I would accept this as a result:

Alternative result


Versions used:

  • Python: 2.7.8
  • Pandas 0.16.0

Advertisement

Answer

Here is an automated way to fix the column index. First, pull the column level values into a DataFrame:

JavaScript

then rename the Unnamed: columns to NaN:

JavaScript

and then forward-fill the NaNs:

JavaScript

so that columns now looks like

JavaScript

Now we can find the remaining NaNs and fill them with empty strings:

JavaScript

To make the first two columns, A and B, indexable as df['A'] and df['B'] — as though they were single-leveled — you could swap the values in the first and second columns:

JavaScript

Now you can build a new MultiIndex and assign it to df.columns:

JavaScript

Putting it all together, if data is

JavaScript

then

JavaScript

yields

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement