I am currently developping some python code to extract data from 14 000 pdfs (7 Mb per pdf). They are dynamic XFAs made from Adobe LiveCycle Designer 11.0 so they contain streams that needs to be decoded later (so there are some non-ascii characters if it makes any difference). My problem is that calling open() on those files takes around