Skip to content
Advertisement

Search word in word documents and print out the file name that contains that word?

Hey so I am new to Python and I wanted to make a script that retrieves the file name from a list of docx documents in a large directory if a file contains a certain word inside the word document.

Here is my code below so far

import os
import docx2txt
os.chdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES')
text= ''
files = []
for file in os.listdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES'):
    if file.endswith('.docx'):
        files.append(file)
for i in range(len(files)):
        text += docx2txt.process(files[i])
if text == str('VENTILATION RATIO'):
    print (i)

My thought process is to convert all these docx documents to txt files then search the files for the word that contains ‘VENTILATION RATIO’. If the word exists in the files, then the file name containing the file will print.

However the output doesn’t print out anything. I know for a fact that in at least one of the Word Documents, there is a word: ‘VENTILATION RATIO’ (and yes, it is case sensitive) in it

Advertisement

Answer

There may be a logic issue in your code.

Try this update:

import os
import docx2txt
os.chdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES')
text= ''
files = []
for file in os.listdir('C:/Users/epicr/Desktop/Python Stuff/LAB FILES'):
    if file.endswith('.docx'):
        files.append(file)
for i in range(len(files)):
    text = docx2txt.process(files[i])  # text for single file
    if 'VENTILATION RATIO' in text:
         print (i, files[i])  # file index and name
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement