Hi I’m starting with python and have a question that sounds pretty simple, but at this point I don’t know how to solve it. I have a code for splitting some reports by columns. Unfortunately, I noticed that the format in output files changes during the split compared to the master file.
__ Dates format changes from short date: 9/29/2005 to 2005-09-29 00:00:00
__ Also number format changes from currencies to ‘standard’ number: 172,057.60 to: 172057.60
I’m struggling to find a solution to this. So far, I have found a lot of formatting information, but I have failed to apply them to the entire columns of the output files. Also it is important that dates don’t turn into strings. Would you help me rewrite the code to get exactly the same formatting as in the master file?
I think that the problem might be related with how .to_excel works, but I’m not sure.
import pandas as pd import os from openpyxl import load_workbook import xlsxwriter from shutil import copyfile file=input('File Path: ') extension = os.path.splitext(file)[1] filename = os.path.splitext(file)[0] pth=os.path.dirname(file) newfile=os.path.join(pth,filename+'_2'+extension) df=pd.read_excel(file, converters={'Employee ID':str,'Job Code':str,'Cost Center - ID':str,'Position ID':str,'Supervisory Organization - ID':str,'Manager ID':str,'Legacy ID Number':str,'CR Payroll ID':str,'Zone':str,'Salary Roll':str,'Work Address - Postal Code':str}) for c in df.columns: print(c) colpick=input('Select Column: ') cols=list(set(df[colpick].values)) def sendtofile(cols): for i in cols: df[df[colpick] == i].to_excel("{}/{}.xlsx".format(pth, i), sheet_name=i, index=False) print('nCompleted') print('Thanks for using this program.') return print('You data will split based on these values {} and create {} files or sheets based on next selection. If you are ready to proceed please type "Y" and hit enter. Hit "N" to exit.'.format(', '.join(cols),len(cols))) while True: x=input('Ready to Proceed (Y/N): ').lower() if x == 'y': while True: sendtofile(cols) break break elif x=='n': print('nThanks for using this program.') break else: continue
Original file ; File after split
Advertisement
Answer
To set the date and/or datetime format in the pandas to_excel()
output you will need to use an ExcelWriter object. Something like this:
def sendtofile(cols): for i in cols: writer = pd.ExcelWriter("{}/{}.xlsx".format(pth, i), datetime_format='m/d/yyyy hh:mm:ss', date_format='m/d/yyyy') df[df[colpick] == i].to_excel(writer, sheet_name=i, index=False)
To set the currency is a bit trickier. You will need to get access to the underlying Excel engine in Pandas and use that. Something like this:
def sendtofile(cols): for i in cols: writer = pd.ExcelWriter("{}/{}.xlsx".format(pth, i), engine='xlsxwriter', datetime_format='m/d/yyyy hh:mm:ss', date_format='m/d/yyyy') df[df[colpick] == i].to_excel(writer, sheet_name=i, index=False) # Get the xlsxwriter workbook and worksheet objects. workbook = writer.book worksheet = writer.sheets[i] # Create a format for currency. currency_format = workbook.add_format({'num_format': '$#,##0'}) # You will need to set the column number (1 in this case) worksheet.set_column(1, 1, None, currency_format)