Tag: performance

Sort the digits of a 1GB file containing a single number efficiently

I’m trying to print in ascending order a 1GB file containing a randomly generated big number. This is the code that I’m using to generate the random number for my test (found it here). The following python code works OK and takes a bit less than 4 minutes. But I was told this can be accomplished in about 15 seconds

Speeding-up pandas column operation based on several rules

bipartite networking pandas performance python

I have a data frame consisting of 5.1 mio rows. Now, consider only a query of my data frame which has the following form: date ID1 ID2 201908 a X 201905 b Y 201811 a Y 201807 a Z You can assume that the date is sorted and that there are no duplicates in the subset [‘ID1’, ‘ID2’]. Now, the

NumPy array row differences

arrays numpy performance python scipy

I have a NumPy array vectors = np.random.randn(rows, cols). I want to find differences between its rows according to some other array diffs which is sparse and “2-hot”: containing a 1 in its column corresponding to the first row of vectors and a -1 corresponding to the second row. Perhaps an example shall make it clearer: then I can compute

Lower execution time for apache log parser in Python

performance python

I have an school assignment where I were tasked with writing a apache log parser in Python. This parser will extract all the IP addresses and all the HTTP Methods using Regex and store these in a nested dictionary. The code can be seen below: This code works (it gives me the expected data for the log files we were

Fastest way to find a 2d array inside another array that holds multiple 2d arrays

numpy performance python

Hi I’m trying to perform a search operation in an array that contains multiple 2d arrays comparing it’s itens to a specific array. I managed to do it using a for loop iterating trough the itens inside the big array but I have to perform this search 10^6 times and the length of this for loop can grow up to

Getting City from IP Address range

ip ip-address performance python python-3.x

I have an IP address. For example, 192.168.2.10 Also I have a dictionary: Question: How should I find the city from my IP address and use this dictionary spending less time (time complexity) as possible? Answer The “proper answer” if you want the best complexity for arbitrarily large data sets is the one given given by Ji Bin. To really

Efficiently search a long list of lists

data-structures performance python search

I have a long list of hexahedral point coordinates, for example: Each row defines a hexahedron cell, and by iterating over each cell, I extract the defining faces of the cell (6 faces), and add each face to a list processed_faces All of this is fine, but because some cells are sharing the same face, I needed a way to

Is there a faster method to do a Pandas groupby cumulative mean?

cumulative-sum pandas pandas-groupby performance python

I am trying to create a lookup reference table in Python that calculates the cumulative mean of a Player’s previous (by datetime) games scores, grouped by venue. However, for my specific need, a player should have previously played a minimum of 2 times at the relevant Venue for a ‘Venue Preference’ cumulative mean calculation. df format looks like the following:

Overlapping regular expression substitution in Python, but contingent on values of capture groups

performance python regex

I’m currently writing a program in Python that is supposed to transliterate all the characters in a language from one orthography into another. There are two things at hand here, one of which is already solved, and the second is the problem. In the first step, characters from the source orthography are converted into the target orthography, e.g. (ffr: the

Generating Scatter Plot from a Matrix

matplotlib performance pyqtgraph python

I have a code that generates random matrices of 0’s and 1’s, and I’d like to convert these matrices into scatter plots, where the coordinate corresponds to the matrix row/column, and the color of the scatter point corresponds to the value (red if 0, blue if 1 for example). I’ve been able to do this with matplotlib, but my use-case