Reading a docx file from s3 bucket with flask results in an AttributeError

Question

I got so many different errors, I don't even know which is pertinent to mention but it's not about the credentials because I can upload files already and I can read a txt file. Now I want to read a docx. I created a form in my index.html with just a text area to write the exact name of the

Accepted Answer

I checked out the documentation for python-docx, specifically the Document-constructor:docx.Document(docx=None)Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.It seems to expect a file-like object or the path to a file. We can turn the different representations we get from boto3 into a file-like object, here’s some sample code:import ioimport boto3import docxBUCKET_NAME = "my-bucket"def main(): s3 = boto3.resource("s3") bucket = s3.Bucket(BUCKET_NAME) object_in_s3 = bucket.Object("test.docx") object_as_streaming_body = object_in_s3.get()["Body"] print(f"Type of object_as_streaming_body: {type(object_as_streaming_body)}") object_as_bytes = object_as_streaming_body.read() print(f"Type of object_as_bytes: {type(object_as_bytes)}") # Now we use BytesIO to create a file-like object from our byte-stream object_as_file_like = io.BytesIO(object_as_bytes) # Et voila! document = docx.Document(docx=object_as_file_like) print(document.paragraphs)if __name__ == "__main__": main()This is what it looks like:$ python test.pyType of object_as_streaming_body: Type of object_as_bytes: []

Advertisement

Answer