I have to work on a flat file (size > 500 Mo) and I need to create to split file on one criterion. My original file as this structure (simplified): JournalCode|JournalLib|EcritureNum|EcritureDate|CompteNum|
I need to create to file depending on the first digit from ‘CompteNum’.
I have started my code as well
import sys import pandas as pd import numpy as np import datetime C_FILE_SEP = "|" def main(fic): pd.options.display.float_format = '{:,.2f}'.format FileFec = pd.read_csv(fic, C_FILE_SEP, encoding= 'unicode_escape')
It seems ok, my concern is to create my 2 files based on criteria. I have tried with unsuccess.
TargetFec = 'Target_'+fic+datetime.datetime.now().strftime("%Y%m%d-%H%M%S")+'.txt' target = open(TargetFec, 'w') FileFec = FileFec.astype(convert_dict) for row in FileFec.iterrows(): Fec_Cpt = str(FileFec['CompteNum']) nb = len(Fec_Cpt) if (nb > 7): target.write(str(row)) target.close()
the result of my target file is not like I expected:
(0, JournalCode OUVERT JournalLib JOURNAL D'OUVERTURE EcritureNum XXXXXXXXXX EcritureDate 20190101 CompteNum 101300 CompteLib CAPITAL SOUSCRIT CompAuxNum CompAuxLib PieceRef XXXXXXXXXX PieceDate 20190101 EcritureLib A NOUVEAU Debit 000000000000,00 Credit 000038188458,00 EcritureLet NaN DateLet NaN ValidDate 20190101 Montantdevise Idevise CodeEtbt 100 Unnamed: 19 NaN
And I expected to obtain line into my target file when CompteNum(0:1) > 7
I have read many posts for 2 days, please some help will be perfect.
There is a sample of my data available here
Philippe
Advertisement
Answer
Suiting the rules and the desired format, you can use logic like:
# criteria: verify = df['CompteNum'].apply(lambda number: str(number)[0] == '8' or str(number)[0] == '9') # saving the dataframes: df[verify].to_csv('c:/users/jack/desktop/meets-criterios.csv', sep = '|', index = False)
Original comment:
As I understand it, you want to filter the imported dataframe according to some criteria. You can work directly on the pandas you imported. Look:
# criteria: verify = df['CompteNum'].apply(lambda number: len(str(number)) > 7) # filtering the dataframe based on the given criteria: df[verify] # meets the criteria df[~verify] # does not meet the criteria # saving the dataframes: df[verify].to_csv('<your path>/meets-criterios.csv') df[~verify].to_csv('<your path>/not-meets-criterios.csv')
Once you have the filtered dataframes, you can save them or convert them to other objects, such as dictionaries.