Skip to content
Advertisement

How to properly reference the previous Pandas DataFrame in the next method in a method chain?

I have been trying to use method chaining in Pandas however there are a few things related to how you reference a DataFrame or its columns that keep tripping me up.

For example in the code below I have filtered the dataset and then want to create a new column that sums the columns remaining after the filter. However I don’t know how to reference the DataFrame that has just been created from the filter. df in the example below refers to the original DataFrame.

JavaScript

Or what about this instance, where the DataFrame is being created in the method chain, This would normally be a pd.read_csv step as opposed to generating the DataFrame. This piece of code would naturally not work as df2 has not been created as yet.

JavaScript

Interestingly enough the issue above is not a problem here as df3[‘xx’] refers to the df3 that has been queried which makes some sense in the context of the second example but then does not make sense with the first example.

JavaScript

I have worked in other languages/libraries such as R or PySpark and method chaining is quite flexible and does not appear to have these barriers. Unless there is something I am missing on how its meant to be done in Pandas or how you meant to reference df[‘xx’] in some other manner.

Lastly I understand that the example problems are easily worked around but I am trying to understand if there is a set method chaining syntax that I am maybe not aware of when referencing these columns.

Advertisement

Answer

For referencing the DataFrame based on a previous computation, the anonymous function(lambda helps) :

JavaScript

It basically references the previous DataFrame. This works with assign.

The pipe method is another option where you can chain methods while referencing the computed DataFrame.

The example below is superflous; hopefully it explains how pipe works:

JavaScript

Not all Pandas functions support chaining; this is where the pipe function could come in handy; you could even write custom functions and pass it to pipe.

All of this information is in the docs: assign; pipe; function application; assignment in method chaining

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement