I am processing information in several Pandas DataFrames with 10,000+ rows. I have… df1, student information df2, student responses I want… a DataFrame with columns for the class number, student ID, and unique assignment titles. The assignment columns should contain the students’ highest score for that assignment. There can be 20+ assignments / columns. A student can have many different
Tag: performance
For loop is several times faster in R than in Python using the rpy2 library
The following simply for block takes about ~3 sec to complete in R: The same code run in Python through the rpy2 library takes between 4-5 times more: Is this just because I’m using the rpy2 library to communicate with R or is there something else at play? Can this be improved in any way (while still running the code
IPython (jupyter) vs Python (PyCharm) performance
Are there any performance difference between a code run on a IPython (Jupyter for example) and the same code run on “standard” Python (PyCharm for example)? I’m working on a neural network for a project where I need some kind of presentation and Jupyter + IPython does the job, but i was wondering if there are any kind of differences
Binary Insertion Sort vs. Quicksort
I was looking at different sorting algorithms and their performance (link) and then I tried to implement some sorting algorithms myself. I wanted to improve them as well and so, as I was coding the insertion sort, I thought why not to use binary search, as the first part of array is already sorted, and in order to get rid
Python decompression relative performance?
TLDR; Of the various compression algorithms available in python gzip, bz2, lzma, etc, which has the best decompression performance? Full discussion: Python 3 has various modules for compressing/decompressing data including gzip, bz2 and lzma. gzip and bz2 additionally have different compression levels you can set. If my goal is to balance file size (/compression ratio) and decompression speed (compression speed
How to increase process speed using read_excel in pandas?
I need use pd.read_excel to process every sheet in one excel file. But in most cases,I did not know the sheet name. So I use this to judge how many sheet in excel: During the process,I found that the process is quite slow, So,can read_excel only read limited rows to improve the speed? I tried nrows but did not work..still
Efficient regex with lists
I have a list of strings coming from os.listdir() that looks like the following: out of those entries, I wanna get the ones that match the “backup_YYYYMMDD” pattern. The regex for that, with named groups, would be I am trying to create a list that contains the date only from the above (aka the .group(‘date’)), but I cannot find a
is there a faster way to get multiple keys from dictionary?
I have a dictionary: Then I have a list of keys: My desired result is: What I’m doing so far is: Is there a faster way? Perhaps without for? Answer You could use: It has two advantages: It performs the d.get lookup only once – not each iteration Only CPython: Because dict.get is implemented in C and map is implemented
How to repeat each of a Python list’s elements n times with itertools only?
I have a list with numbers: numbers = [1, 2, 3, 4]. I would like to have a list where they repeat n times like so (for n = 3): [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]. The problem is that I would like to only use itertools for this, since I am very constrained
Artificially creating memory usage in Python
I’m trying to create a pure memory intensive script in Python for testing purposes but every script that I try also increases my cpu. I’ve read this post and I also tried, among others: in order to copy an array to another array but once again I had cpu variations as well. UPDATED So, how can I cause a standard