Pandas average of previous rows fulfilling condition

I have a huge data-frame (>20m rows) with each row containing a timestamp and a numeric variable X. I want to assign a new column where for each row the value in this new column is the average of X …

Most efficient way to combine large Pandas DataFrames based on multiple column values

I am processing information in several Pandas DataFrames with 10,000+ rows. I have… df1, student information Class Number Student ID 0 13530159 201733468 1 13530159 201736271 2 …

For loop is several times faster in R than in Python using the rpy2 library

The following simply for block takes about ~3 sec to complete in R: The same code run in Python through the rpy2 library takes between 4-5 times more: Is this just because I’m using the rpy2 library to communicate with R or is there something else at play? Can this be improved in any way (while still running the code in Python)? Answer 4 to 5 times slower seems a little much, but this might be the case if you are using custom conversion (rpy2 can convert R objects to arbitrary Python objects on the fly – see the doc).

Efficient regex with lists

I have a list of strings coming from os.listdir() that looks like the following: [‘foo’, ‘bar’ ‘backup_20180406’ …] out of those entries, I wanna get the ones that match the “backup_YYYYMMDD” …

Why is statistics.mean() so slow?

I compared the performance of the mean function of the statistics module with the simple sum(l)/len(l) method and found the mean function to be very slow for some reason. I used timeit with the two …

Implement MATLAB’s im2col ‘sliding’ in Python

Q: How to speed this up? Below is my implementation of Matlab’s im2col ‘sliding’ with the additional feature of returning every n’th column. The function takes an image (or any 2 dim array) and …

Get difference between two lists

I have two lists in Python, like these: temp1 = [‘One’, ‘Two’, ‘Three’, ‘Four’] temp2 = [‘One’, ‘Two’] I need to create a third list with items from the first list which aren’t present in the second …

python string join performance

There are a lot of articles around the web concerning python performance, the first thing you read: concatenating strings should not be done using ‘+’: avoid s1+s2+s3, instead use str.join I tried …