Skip to content
Advertisement

capture pattern_X repeatedly, then capture pattern_Y once, then repeat until EOS

[update:] Accepted answer suggests, this can not be done with the python re library in one step. If you know otherwise, please comment.

I’m reverse-engineering a massive ETL pipeline, I’d like to extract the full data lineage from stored procedures and views.

I’m struggling with the following regexp.

JavaScript

TLDR: I’d like to capture

JavaScript

from a string like

JavaScript

where a,b,e,f,h match one pattern, X,Y,Z match another, and the first pattern might occur up to ~20 times, before the second one appears, which always appears alone.

I’m open to solutions with the sqlglot, sql-metadata, or sqlparse libraries as well, it is just regex is better documented.

(Probably I’m code golfing, and I should do this in several steps, starting with splitting the string into individual expressions.)

Advertisement

Answer

You may use this regex with 3 capture and 1 non-capture groups:

JavaScript

RegEx Demo

Code:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement