I have a dataframe with missing data in several columns. In some of these columns, say ‘Col_A’ to ‘Col_D’, I’d like to replace them with 0. I tried it this way:
reduce(lambda x : df.fillna({x : 0}, inplace=True), ['Col_A', 'Col_B', 'Col_C', 'Col_D'])
but I got the error message <lambda>() takes 1 positional argument but 2 were given
. Eventually, I changed my solution to simply
df[['Col_A', 'Col_B', 'Col_C', 'Col_D']] = df[['Col_A', 'Col_B', 'Col_C', 'Col_D']].fillna(0)
but I still wonder what’s wrong with my previous attempt of solution.
Advertisement
Answer
As mentioned in the comments, it is a rather odd way of achieving your goal with multiple issues that go against good programming practice
So start with a disclaimer: I WOULD NOT RECOMMEND DOING THIS BUT AM JUST ANSWERING THE QUESTION ABOUT THE BEHAVIOR OF THE FUNCTION.
With the disclaimer out of the way, this can be made to work with two small changes. The following should work
reduce(lambda _, x : df.fillna({x : 0}, inplace=True), ['Col_A', 'Col_B', 'Col_C', 'Col_D'], 'fake')
Note first that we make lambda
to take two arguments, as required by the reduce
function. The first argument is meant to be the result of the application of the function at the previous step. Here, since we do not really care about the result of the application of the function at the previous step, but rely on side effects of reduce
on the global (to reduce function) variable df
— this being the main problems with this approach — we make it an unnamed argument _
. Second, we need a starting point — the so-called initializer — for reduce
to work, as it will call the lambda function with this value, and the first value of the list, as the first step. If we omit the initializer as you did, it will start with the first two values of the list, thus basically missing Col_A
to fillna
. So, hence, we have a 'fake'
intializer passed to the reduce
function (you can call it whatever you want, if that is not clear)