Skip to content
Advertisement

Comparing data between two CSV files to move data to a third CSV file

I have this csv file, file1.csv:

OU    
CORP:Jenny Smith:
"CORP:John Smith:,John Smith:"
STORE:Mary Poppins:
STORE:Tony Stark: 
STORE:Carmen Sandiego:
NEWS:Peter Parker:
NEWS:Clark Kent:
STORES:store123:
NEWS:CORPmanager:
NEWS:STOREmanager:
ICECREAM:Will Ferrall:
SECURITY:James Bond:
SECURITY:Sherlock Holmes:
DELI:Brian Johnson:
""

Then file2.csv:

OU                      
CORP
STORE     
NEWS
ICECREAM
SECURITY
DELI
""

For every line in file2.csv that has ‘CORP’, ‘STORE’, or ‘NEWS’, I already searched through file1.csv and created a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.

I want OUs, such as ICECREAM, SECURITY, and DELI to be in the CORPall.csv file too.

So NEWSall.csv has:

OU  

NEWS:Peter Parker:   
NEWS:Clark Kent:  
NEWS:CORPmanager 
NEWS:STOREmanager  

In CORPall.csv

OU 
CORP:Jenny Smith: 
CORP:John Smith:,John Smith: 
ICECREAM:Will Ferrall:
SECURITY:James Bond:
SECURITY:Sherlock Holmes:
DELI:Brian Johnson:

In STOREall.csv

OU    
STORE:Mary Poppins:
STORE:Tony Stark: 
STORE:Carmen Sandiego:
STORES:store123:

I am using Pandas and CSV in my program. I used this snippet of code to move STORE: or CORP:, etc., to their corresponding files. allOUs is all the OUs that go to file2.csv.

for dept in allOUs['OU']:
   df_dept = df[df['OU'].str.startswith(f'{dept}:')]

   df_dept['OU'].to_csv(f'{dept}all.csv', index=False, header=False)

This is the logic in my head of what I want to happen:

df_dept = df[df['OU'].str.startswith(f'{dept}:')]

if df_dept == "ICECREAM:" or df_dept == "SECURITY:" or df_dept == "DELI:":
   df_dept['OU'].to_csv('CORPall.csv', index=False, header=False)

else: #if everything else is not icecream, security or deli
   #move them normally to their self named files
   df_dept['OU'].to_csv(f'{dept}all.csv', index=False, header=False)

Advertisement

Answer

Since you’re not using the header (header=False), you can check if dept is in the list of words that needs to be written to CORP file. Then, for the CORP file, you can use to_csv with argument mode='a', which makes the data being written to be inserted at the end, after any preexisting data (of the CORP category).

import pandas as pd

OUList = pd.DataFrame({'OU':['CORP', 'STORE', 'NEWS', 'ICECREAM', 'SECURITY', 'DELI']})

df = pd.read_csv('sample.csv', sep=';')

for dept in OUList['OU']:

    df_dept = df[df['OU'].str.startswith(f'{dept}:')]
    if dept in ["ICECREAM", "SECURITY", "DELI", "CORP"]:
        df_dept.to_csv(f'CORPall.csv', index=False, mode='a', header=False)
    else:
        df_dept.to_csv(f'{dept}all.csv', index=False, header=False)
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement