I’m trying to print in ascending order a 1GB file containing a randomly generated big number. This is the code that I’m using to generate the random number for my test (found it here). The following python code works OK and takes a bit less than 4 minutes. But I was told this can be accomplished in about 15 seconds
Tag: performance
Speeding-up pandas column operation based on several rules
I have a data frame consisting of 5.1 mio rows. Now, consider only a query of my data frame which has the following form: date ID1 ID2 201908 a X 201905 b Y 201811 a Y 201807 a Z You can assume that the date is sorted and that there are no duplicates in the subset [‘ID1’, ‘ID2’]. Now, the
NumPy array row differences
I have a NumPy array vectors = np.random.randn(rows, cols). I want to find differences between its rows according to some other array diffs which is sparse and “2-hot”: containing a 1 in its column corresponding to the first row of vectors and a -1 corresponding to the second row. Perhaps an example shall make it clearer: then I can compute
Lower execution time for apache log parser in Python
I have an school assignment where I were tasked with writing a apache log parser in Python. This parser will extract all the IP addresses and all the HTTP Methods using Regex and store these in a nested dictionary. The code can be seen below: This code works (it gives me the expected data for the log files we were
Fastest way to find a 2d array inside another array that holds multiple 2d arrays
Hi I’m trying to perform a search operation in an array that contains multiple 2d arrays comparing it’s itens to a specific array. I managed to do it using a for loop iterating trough the itens inside the big array but I have to perform this search 10^6 times and the length of this for loop can grow up to
Getting City from IP Address range
I have an IP address. For example, 192.168.2.10 Also I have a dictionary: Question: How should I find the city from my IP address and use this dictionary spending less time (time complexity) as possible? Answer The “proper answer” if you want the best complexity for arbitrarily large data sets is the one given given by Ji Bin. To really
Efficiently search a long list of lists
I have a long list of hexahedral point coordinates, for example: Each row defines a hexahedron cell, and by iterating over each cell, I extract the defining faces of the cell (6 faces), and add each face to a list processed_faces All of this is fine, but because some cells are sharing the same face, I needed a way to
Is there a faster method to do a Pandas groupby cumulative mean?
I am trying to create a lookup reference table in Python that calculates the cumulative mean of a Player’s previous (by datetime) games scores, grouped by venue. However, for my specific need, a player should have previously played a minimum of 2 times at the relevant Venue for a ‘Venue Preference’ cumulative mean calculation. df format looks like the following:
Overlapping regular expression substitution in Python, but contingent on values of capture groups
I’m currently writing a program in Python that is supposed to transliterate all the characters in a language from one orthography into another. There are two things at hand here, one of which is already solved, and the second is the problem. In the first step, characters from the source orthography are converted into the target orthography, e.g. (ffr: the
Generating Scatter Plot from a Matrix
I have a code that generates random matrices of 0’s and 1’s, and I’d like to convert these matrices into scatter plots, where the coordinate corresponds to the matrix row/column, and the color of the scatter point corresponds to the value (red if 0, blue if 1 for example). I’ve been able to do this with matplotlib, but my use-case