I’ve written a function below that takes, as arguments, a dataframe (df
) and two of its column names (var1
, var2
). Then it creates interaction variables for the two variables and adds those columns to the original dataframe. The code works when I hard code it, but when I try to call the function like:
create_interactions(my_dataframe, 'variable1', 'variable2') my_dataframe
I receive no errors but the new columns are not added to the dataframe – it returns the original dataframe. What am I doing wrong? Thank you.
def create_interactions(df,var1,var2): variables = df[[var1,var2]] for i in range(0, variables.columns.size): for j in range(0, variables.columns.size): col1 = str(variables.columns[i]) col2 = str(variables.columns[j]) if i <= j: name = col1 + "*" + col2 df = pd.concat([df, pd.Series(variables[col1] * variables[col2], name=name)], axis=1)
Advertisement
Answer
Doing df = ...
doesn’t modify the original df. It just makes a new local variable with your new df.
You could return df
from your function, and then use it like df = create_interactions(df, 'var1', 'var2')
.
But if you do want your function to modify the original df
, it might be better to change your last line to this:
df[name] = pd.Series(variables[col1] * variables[col2], name=name)
This will insert the new column into the existing DataFrame.
There are a couple other odd things about your code. You create a new variable called variables
that just contains two columns of the original df
. Then you loop over range(0, variables.columns.size)
. But since you defined variables
to have only two columns, variables.columns.size
will always be two. Later, you grab columns from variables
, but these same columns are already present in df
, so you could just grab them from df
instead.
Also, your code creates “interactions” of each variable with itself, which seems a bit odd. I think your code could be simplified to this:
def create_interaction(df,var1,var2): name = var1 + "*" + var2 df[name] = pd.Series(df[var1] * df[var2], name=name)
Since you only accept exactly two variables, there will be exactly one interaction, so you don’t need any loops at all. (And I renamed it create_interaction
to indicate this! :-) Just grab the two specified variables and multiply them.