Skip to content

Tag: parsing

capture pattern_X repeatedly, then capture pattern_Y once, then repeat until EOS

[update:] Accepted answer suggests, this can not be done with the python re library in one step. If you know otherwise, please comment. I’m reverse-engineering a massive ETL pipeline, I’d like to extract the full data lineage from stored procedures and views. I’m struggling with the following regexp. TLDR: I’d like to capture from a string like where a,b,e,f,h match

How to extract multiple specific string lines from another string?

I’m using a FEM software (Code_Aster) which is developed in Python, but its file extension is .comm. However, Python snippets can be inserted in .comm files. I have a function which returns a very long string containing nodes, elements, elements group, and so on in the below form: My goal is to add each row to a list/dictionary with its

I cannot parse this xml file in python

I am trying to create an API connection and response is looking like below. I need to parse this data and turn it into a pd dataframe and/or create loop to find specific information belong to tags. Below is the code i try to run but it returns with empty list, and it looks not iterable. Also it is not

How can I parse an object in a python c-extention?

I have in Python an object such as: And I want to read it in c in the How can I access the members inside a pyObject for custom python data structures? How can I do the opposite thing, assign values to an object that will later be called inside python? edit: As suggested by kpie, using the function PyObject_GetAttrString

Extract HTML into JSON with pyhton BeautifulSoup

The problem I’m trying to parse some blocks of HTML to store the relevant data in a JSON object but I’m struggling with the way BeautifulSoup’s treatment of child tags clashes with my specific requirements. Eample input: Desired output: My attempt Here’s my best attempt so far: Which produces the following output: You can see I have three issues: The

competing regular expressions (race condition)

I’m trying to use python PLY (lex/yacc) to parse a language called ‘GRBL’. GRBL looks something like this: The ‘G’ Codes tell a machine to ‘go’ (or move) and the coordinates say where. LEX requires us to specify a unique regular expression for every possible ‘token’. So in this case I need a regex that will clearly define ‘G00’ and