As a hobby, I started doing a project with amazon textract which helps in extracting text from a photo or a pdf. Now I ran into a problem. According to what I read from it’s docs, every word in the photo is a small “block”. When I try printing, it prints fine, but if I have to use that text to send somewhere, like an email etc, I need the whole text as a single file. So I would need all blocks of texts to be stored in a single response to help my further use. This is where I am stuck for a few days. Help appreciated. Thank you
JavaScript
x
15
15
1
def processor(name):
2
textract = boto3.client('textract')
3
response = textract.detect_document_text(
4
Document = {
5
'S3Object':{
6
'Bucket':bucketName,
7
'Name':name
8
}
9
}
10
11
)
12
for item in response["Blocks"]:
13
if item["BlockType"] == "LINE":
14
print (item["Text"])
15
Advertisement
Answer
The one liner below should do the job
JavaScript
1
2
1
single_response = ' '.join(item["Text"] for item in response["Blocks"] if item["BlockType"] == "LINE")
2