Skip to content
Advertisement

Preserve formatting during excel file split with python

Hi I’m starting with python and have a question that sounds pretty simple, but at this point I don’t know how to solve it. I have a code for splitting some reports by columns. Unfortunately, I noticed that the format in output files changes during the split compared to the master file.

__ Dates format changes from short date: 9/29/2005 to 2005-09-29 00:00:00

__ Also number format changes from currencies to ‘standard’ number: 172,057.60 to: 172057.60

I’m struggling to find a solution to this. So far, I have found a lot of formatting information, but I have failed to apply them to the entire columns of the output files. Also it is important that dates don’t turn into strings. Would you help me rewrite the code to get exactly the same formatting as in the master file?

I think that the problem might be related with how .to_excel works, but I’m not sure.

    import pandas as pd
    import os
    from openpyxl import load_workbook
    import xlsxwriter
    from shutil import copyfile
    
    
    file=input('File Path: ')
    extension = os.path.splitext(file)[1]
    filename = os.path.splitext(file)[0]
    pth=os.path.dirname(file)
    newfile=os.path.join(pth,filename+'_2'+extension)
    df=pd.read_excel(file,  converters={'Employee ID':str,'Job Code':str,'Cost Center - ID':str,'Position ID':str,'Supervisory Organization - ID':str,'Manager ID':str,'Legacy ID Number':str,'CR Payroll ID':str,'Zone':str,'Salary Roll':str,'Work Address - Postal Code':str})

    
    for c in df.columns:
        print(c)
    
    colpick=input('Select Column: ')
    cols=list(set(df[colpick].values))
    
    def sendtofile(cols):
        for i in cols:
            df[df[colpick] == i].to_excel("{}/{}.xlsx".format(pth, i), sheet_name=i, index=False)
        print('nCompleted')
        print('Thanks for using this program.')
        return
    
    
    
    print('You data will split based on these values {} and create {} files or sheets based on next selection. If you are ready to proceed please type "Y" and hit enter. Hit "N" to exit.'.format(', '.join(cols),len(cols)))
    while True:
        x=input('Ready to Proceed (Y/N): ').lower()
        if x == 'y':
            while True:
                    sendtofile(cols)
                    break
            break
        elif x=='n':
            print('nThanks for using this program.')
            break
    
        else: continue

Original file ; File after split

Advertisement

Answer

To set the date and/or datetime format in the pandas to_excel() output you will need to use an ExcelWriter object. Something like this:

    def sendtofile(cols):
        for i in cols:
            writer = pd.ExcelWriter("{}/{}.xlsx".format(pth, i),
                                    datetime_format='m/d/yyyy hh:mm:ss',
                                    date_format='m/d/yyyy')
            df[df[colpick] == i].to_excel(writer, sheet_name=i, index=False)


To set the currency is a bit trickier. You will need to get access to the underlying Excel engine in Pandas and use that. Something like this:

    def sendtofile(cols):
        for i in cols:
            writer = pd.ExcelWriter("{}/{}.xlsx".format(pth, i),
                                    engine='xlsxwriter',
                                    datetime_format='m/d/yyyy hh:mm:ss',
                                    date_format='m/d/yyyy')
            df[df[colpick] == i].to_excel(writer, sheet_name=i, index=False)

            # Get the xlsxwriter workbook and worksheet objects.
            workbook  = writer.book
            worksheet = writer.sheets[i]

            # Create a format for currency.
            currency_format = workbook.add_format({'num_format': '$#,##0'})

            # You will need to set the column number (1 in this case)
            worksheet.set_column(1, 1, None, currency_format)

See Working with Python Pandas and XlsxWriter.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement