Skip to content
Advertisement

Python and LXML: extremely slow, more efficient code?

I’m processing XML documents like the following.

JavaScript

I’m using the following code to change the attributes of certain elements whenever certain conditions are met. The code works as expected and I’m getting the output I want. However the time it takes to process all the files seems way too much. If you notice, I have some print statements that allow me to monitor the whole process. Sometimes it takes 5 minutes or more between two prints.

In fact, I’ve had to kill the process because it was taking too long. I know it is working fine because the output files that I get are correctly modified and when I did a test with a much smaller number of files the whole process managed to run to its end without a glitch (although taking forever to finish).

I have one of the new macs with the M1 max silicon chip so I thought this would go a lot faster. It is just as slow with the Intel chip. Is this normal when using LXML or being a novice I’m just producing very inefficient code? Is there any way to make this kind of thing faster?

Thanks in advance for any help you can provide.

JavaScript

Advertisement

Answer

Code is evaluating all conditions while only one would be met each time. One possible optimization is to make them if-elseif

JavaScript

Also, plain XPath could be used to avoid regular expressions

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement