Tag: performance

Most efficient way to combine large Pandas DataFrames based on multiple column values

data-wrangling dataframe pandas performance python

I am processing information in several Pandas DataFrames with 10,000+ rows. I have… df1, student information df2, student responses I want… a DataFrame with columns for the class number, student ID, and unique assignment titles. The assignment columns should contain the students’ highest score for that assignment. There can be 20+ assignments / columns. A student can have many different

For loop is several times faster in R than in Python using the rpy2 library

performance python r rpy2

The following simply for block takes about ~3 sec to complete in R: The same code run in Python through the rpy2 library takes between 4-5 times more: Is this just because I’m using the rpy2 library to communicate with R or is there something else at play? Can this be improved in any way (while still running the code

IPython (jupyter) vs Python (PyCharm) performance

ipython jupyter-notebook performance python

Are there any performance difference between a code run on a IPython (Jupyter for example) and the same code run on “standard” Python (PyCharm for example)? I’m working on a neural network for a project where I need some kind of presentation and Jupyter + IPython does the job, but i was wondering if there are any kind of differences

Binary Insertion Sort vs. Quicksort

algorithm arrays performance python sorting

I was looking at different sorting algorithms and their performance (link) and then I tried to implement some sorting algorithms myself. I wanted to improve them as well and so, as I was coding the insertion sort, I thought why not to use binary search, as the first part of array is already sorted, and in order to get rid

Python decompression relative performance?

bz2 gzip lzma performance python

TLDR; Of the various compression algorithms available in python gzip, bz2, lzma, etc, which has the best decompression performance? Full discussion: Python 3 has various modules for compressing/decompressing data including gzip, bz2 and lzma. gzip and bz2 additionally have different compression levels you can set. If my goal is to balance file size (/compression ratio) and decompression speed (compression speed

How to increase process speed using read_excel in pandas?

dataframe excel pandas performance python

I need use pd.read_excel to process every sheet in one excel file. But in most cases,I did not know the sheet name. So I use this to judge how many sheet in excel: During the process,I found that the process is quite slow, So,can read_excel only read limited rows to improve the speed? I tried nrows but did not work..still

Efficient regex with lists

performance python regex

I have a list of strings coming from os.listdir() that looks like the following: out of those entries, I wanna get the ones that match the “backup_YYYYMMDD” pattern. The regex for that, with named groups, would be I am trying to create a list that contains the date only from the above (aka the .group(‘date’)), but I cannot find a

is there a faster way to get multiple keys from dictionary?

dictionary key-value performance python

I have a dictionary: Then I have a list of keys: My desired result is: What I’m doing so far is: Is there a faster way? Perhaps without for? Answer You could use: It has two advantages: It performs the d.get lookup only once – not each iteration Only CPython: Because dict.get is implemented in C and map is implemented

Artificially creating memory usage in Python

memory performance python testing

I’m trying to create a pure memory intensive script in Python for testing purposes but every script that I try also increases my cpu. I’ve read this post and I also tried, among others: in order to copy an array to another array but once again I had cpu variations as well. UPDATED So, how can I cause a standard