I have two dataframes:
df1:
ID Open High Low 1 64 66 52
df2
ID Open High Volume 1 33 45 30043
I want to write a function that checks if the column headers are matching/the same as columns in df1.
IF not we get a message telling us what column is missing.
Example of the message given these dataframes:
"The column 'Low' is not selected in df2. The column 'Volume' is not selected in df1'
I want a generalized code that can work for any given dataframe.
Is this possible on python?
Advertisement
Answer
You can have access to the column names via .columns
and then use set operations to check what you want:
import pandas as pd df1 = pd.DataFrame( { "ID": [1], "Open": [64], "High": [66], "Low": [52] } ) df2 = pd.DataFrame( { "ID": [1], "Open": [33], "High": [45], "Volume": [30043] } ) df1_columns = set(df1.columns) df2_columns = set(df2.columns) common_columns = df1_columns & df2_columns df1_columns_only = df1_columns - common_columns df2_columns_only = df2_columns - common_columns print("Columns only available in df1", df1_columns_only) print("Columns only available in df2", df2_columns_only)
And it gives the expected output:
Columns only available in df1 {'Low'} Columns only available in df2 {'Volume'}