Skip to content
Advertisement

how can i def function for new Dataframe with Cleaned data

I have several dataframes where I need to reduce the dataframe to a time span for all of them. So that I don’t have to reduce the codeblock over and over again, I would like to write a function.

Currently everything is realized without working by the following code:

timerange = (df_a['Date'].max() - pd.DateOffset(months=11))
df_a_12m = df_a.loc[df_a['Date'] >= timerange]

my approach:

def Time_range(Data_1, x,name, column, name):
   t = Data_1[column].max() - pd.DateOffset(months=x)
   'df'_ + name = Data_1.loc[Data_1[column] >= t]

unfortunately this does not work

Advertisement

Answer

There are a few mistakes in your approach. Firstly, when you create a new variable you need to specify exactly what it will be called. It is not possible to “dynamically” name a variable like you’re trying with 'df_' + name = something.

Second, variable scope dictates that any variable created in a function is only accessible inside that function, and ceases to exist once it finishes executing (unless you play special tricks with global variables). So, even if you did df_name = Data_1.loc[Data_1[column] >= t], once Time_range() finishes running, that variable will be deleted.

What you can do is have the function return the finished DataFrame and assign the result as a new variable from the outside:

def Time_range(Data_1, x, column):
    t = Data_1[column].max() - pd.DateOffset(months=x)
    return Data_1.loc[Data_1[column] >= t].copy()

df_any_name_you_want = Time_range(df_a, 11, 'Date')

Generally, this is what you want functions to do. Do some operations and return a finished value that you can then use from the outside.

Advertisement