I’ve written a function below that takes, as arguments, a dataframe (
df) and two of its column names (
var2). Then it creates interaction variables for the two variables and adds those columns to the original dataframe. The code works when I hard code it, but when I try to call the function like:
create_interactions(my_dataframe, 'variable1', 'variable2') my_dataframe
I receive no errors but the new columns are not added to the dataframe – it returns the original dataframe. What am I doing wrong? Thank you.
def create_interactions(df,var1,var2): variables = df[[var1,var2]] for i in range(0, variables.columns.size): for j in range(0, variables.columns.size): col1 = str(variables.columns[i]) col2 = str(variables.columns[j]) if i <= j: name = col1 + "*" + col2 df = pd.concat([df, pd.Series(variables[col1] * variables[col2], name=name)], axis=1)
df = ... doesn’t modify the original df. It just makes a new local variable with your new df.
You could return
df from your function, and then use it like
df = create_interactions(df, 'var1', 'var2').
But if you do want your function to modify the original
df, it might be better to change your last line to this:
df[name] = pd.Series(variables[col1] * variables[col2], name=name)
This will insert the new column into the existing DataFrame.
There are a couple other odd things about your code. You create a new variable called
variables that just contains two columns of the original
df. Then you loop over
range(0, variables.columns.size). But since you defined
variables to have only two columns,
variables.columns.size will always be two. Later, you grab columns from
variables, but these same columns are already present in
df, so you could just grab them from
Also, your code creates “interactions” of each variable with itself, which seems a bit odd. I think your code could be simplified to this:
def create_interaction(df,var1,var2): name = var1 + "*" + var2 df[name] = pd.Series(df[var1] * df[var2], name=name)
Since you only accept exactly two variables, there will be exactly one interaction, so you don’t need any loops at all. (And I renamed it
create_interaction to indicate this! :-) Just grab the two specified variables and multiply them.