Skip to content
Advertisement

How to vectorize groupby and apply in pandas?

I’m trying to calculate (x-x.mean()) / (x.std +0.01) on several columns of a dataframe based on groups. My original dataframe is very large. Although I’ve splitted the original file into several chunks and I’m using multiprocessing to run the script on each chunk of the file, but still every chunk of the dataframe is very large and this process never finishes.

I used the following code:

JavaScript

Based on my experience groupby, apply and join are not efficient for large dataframes, so I would like to find a way to replace the groupby and the apply functions.
Does anyone know a better way for vectrozing this process, instead of using groupby and apply? I’m also not looking for a multiprocessing libraries such as pandarallel, swifter or dask because I’ve tried those and they didn’t help me.

Sample df:

JavaScript

Advertisement

Answer

Not sure about performance, but here you can use GroupBy.transform:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement