I’ve got a script updating 5-10 columns worth of data , but sometimes the start csv will be identical to the end csv so instead of writing an identical csvfile I want it to do nothing…
How can I compare two dataframes to check if they’re the same or not?
csvdata = pandas.read_csv('csvfile.csv') csvdata_old = csvdata # ... do stuff with csvdata dataframe if csvdata_old != csvdata: csvdata.to_csv('csvfile.csv', index=False)
Any ideas?
Advertisement
Answer
You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):
csvdata_old = csvdata.copy()
To check whether they are equal, you can use assert_frame_equal as in this answer:
from pandas.util.testing import assert_frame_equal assert_frame_equal(csvdata, csvdata_old)
You can wrap this in a function with something like:
try: assert_frame_equal(csvdata, csvdata_old) return True except: # appeantly AssertionError doesn't catch all return False
There was discussion of a better way…