How to properly reference the previous Pandas DataFrame in the next method in a method chain?

Question

I have been trying to use method chaining in Pandas however there are a few things related to how you reference a DataFrame or its columns that keep tripping me up. For example in the code below I have filtered the dataset and then want to create a new column that sums the columns remaining after the filter. However I

Accepted Answer

For referencing the DataFrame based on a previous computation, the anonymous function(lambda helps) :df.filter(like='x').assign(n = lambda df: df.sum(1))   xx  xy   n0   1   1   21   2   2   42   3   3   63   4   4   84   5   5  105   6   6  12It basically references the previous DataFrame. This works with assign.The pipe method is another option where you can chain methods while referencing the computed DataFrame.The example below is superflous; hopefully it explains how pipe works:df3.pipe(lambda df: df.assign(r = 2))Out[37]:    xx  xy  z  r0   1   1  1  21   2   2  2  22   3   3  3  23   4   4  4  24   5   5  5  25   6   6  6  2Not all Pandas functions support chaining; this is where the pipe function could come in handy; you could even write custom functions and pass it to pipe.All of this information is in the docs: assign; pipe; function application; assignment in method chaining

Advertisement

Answer