I am forced to ask this question
My mentor has given me a task to extract data from files with pure python, there were some txt
file which were easy but there is a file with xlsx
extension and I can’t find any where if it is possible to extract the data from it with pure python (I have been searching for more than 3 weeks now).
Please if it is not possible tell me so that I can show this to her with confidence because my mentor keeps insisting that it is possible and I should do it with pure python but she refuses to give me any clues and tips.
And If it is possible tell me how to do it or where to read more about it.
Advertisement
Answer
All MS Office files with extensions ending in x
are in fact zip archives (so you can change the extension and unpack) and they typically contain a handful of XML files along with media (images, videos, etc.).
You can process all of these XML files as text, or using xml
module from standard Python library – you can work with them in a slightly more advanced way.
The formats are complex but often times you can do basic things without going through thousands of pages of documentation.