Skip to content
Advertisement

Pandas Multiindex to CSV without duplicate Index

For the life of me I can’t figure out how to remove the duplicate index items created when writing a multiindex dataframe to CSV. While there is this answer out there, it doesn’t apply to me per se because my second level has all different values.

This is a chunk of the dataframe I have, it just goes on for each month

Month Measure
JAN   KWH          
      Max KW       
      Day/Hour     
      Peak End Use 
      Peak Pct     
FEB   KWH          
      Max KW       
      Day/Hour     
      Peak End Use 
      Peak Pct     

But in CSV it looks something like

JAN KWH
JAN Max KW
JAN Day/Hour
JAN Peak End Use
JAN Peak Pct
FEB KWH
FEB Max KW
FEB Day/Hour
FEB Peak End Use
FEB Peak Pct

I hope for it to look exactly just like the dataframe so that means in CSV it would look like

Jan,KWH
   ,Max KW
   ,Day/Hour

and so on

I have tried df.index.to_series().duplicates() but it doesn’t work because each of my index pairs are not duplicates. I’ve also tried using index.tolist() then looping through to change with no luck.

What’s the right approach here?

Thanks in advance!

Advertisement

Answer

It seems pandas does not provide directly such functionality, perhaps to ensure that the generated csv files can be read back, as mentioned in the comments above. I faced with the same problem when generating a table for report in rst format. Finally, I decided to process the generated csv by an external script. The script replaces the repeating values in columns with spaces, thus achieving desired result. Note: this script assumes commas as csv separators.

# Clean csv with multiindices
from sys import argv

# Positional command line arguments:
#   1: original csv file
#   2: number of columns to check (optional)

with open(argv[1], 'r') as f:
  if len(argv) > 2:
    n = int(argv[2])
    i0 = ('', ) * n
  else:
    n = None
    i0 = None
  for l in f:
    if n is None:
        # define n from the 1-st line
        i1 = l.split(',')
        n = len(i1)
        i0 = ('', ) * n
    i1 = l.split(',', n)
    nl = ()
    for s0, s1 in zip(i0, i1):
        if s0 == s1:
            e = ' ' * len(s0)
        else:
            e = s1
        nl += (e, )
    nl += tuple(i1[n:])
    print ','.join(nl),
    i0 = i1[:n]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement