Skip to content
Advertisement

How to improve the pattern matching on a list in Python

I have a big list, which may carry thousands to millions of entries. I set a window of finite size to slide over the list. I need to count the matched elements in the windows and repeat the procedure by sliding the window 1 position forward at a time. Here is a simple example of a list

JavaScript

Assuming the window is of the length of 3, it will capture

  1. [1 2 1] which contains one pair of matching elements (1 & 1)
  2. move the windows forward by 1 position => [2 1 3], no matching elements
  3. move the windows forward by 1 position => [1 3 4], no matching elements
  4. move the windows forward by 1 position => [3 4 5], no matching elements
  5. move the windows forward by 1 position => [4 5 1], no matching elements
  6. move the windows forward by 1 position => [5 1 2], no matching elements
  7. move the windows forward by 1 position => [1 2 1], 1 matching elements (1 & 1)
  8. move the windows forward by 1 position => [2 1 2], 1 matching elements (2 & 2)
  9. move the windows forward by 1 position => [1 2 2], 1 matching elements (2 & 2)
  10. move the windows forward by 1 position => [2 2 2], 3 matching elements ([2 2 -], [2 – 2], [- 2 2])
  11. move the windows forward by 1 position => [2 2 3], 1 matching elements (2 & 2)

So total 1 + 1 + 1 + 1 + 3 + 1 = 8 matching pairs. I found the idea to use itertools to find the combination of all elements in a window and develop a code to find all the matching pairs

JavaScript

it works for a short list but for the list and the windows getting large, this code is so slow. I have been working with C++ for years and decide to switch to python, I will appreciate it if there is any hint to improve the efficiency of the code.

Advertisement

Answer

Keep track of the data in your window more efficiently? This is O(|L|) instead of your O(|L|*winlen^2). It keeps the window’s element counts in ctr and the window’s matches in match. For example, when a new value enters the window and there are already two instances of that value in the window, you get two new matches. Similarly for a value falling out of the window, it takes as many matches with it as there are other instances of it in the window.

JavaScript

Benchmark results with L and winlen multiplied by 20:

JavaScript

Benchmark code (also includes testing code for all lists of numbers 1 to 3 of lengths 0 to 9):

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement