I have a folder full of text documents, the text of which needs to be loaded into a single list variable.
Each index of the list, should be the full text of each document.
So far I have this code, but it is not working as well.
JavaScript
x
8
1
dir = os.path.join(current_working_directory, 'FolderName')
2
file_list = glob.glob(dir + '/*.txt')
3
corpus = [] #-->my list variable
4
for file_path in file_list:
5
text_file = open(file_path, 'r')
6
corpus.append(text_file.readlines())
7
text_file.close()
8
Is there a better way to do this?
Edit: Replaced the csv reading function (read_csv
) with text reading function (readlines()
).
Advertisement
Answer
You just need to read()
each file in and append it to your corpus
list as follows:
JavaScript
1
13
13
1
import glob
2
import os
3
4
file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
5
6
corpus = []
7
8
for file_path in file_list:
9
with open(file_path) as f_input:
10
corpus.append(f_input.read())
11
12
print(corpus)
13
Each list entry would then be the entire contents of each text file. Note, using readlines()
would give you a list of lines for each file rather than the raw text.
With a list-comprehension
JavaScript
1
4
1
file_list = glob.glob(os.path.join(os.getcwd(), "FolderName", "*.txt"))
2
3
corpus = [open(file).read() for file in file_list]
4
This approach though might end up with more resource usage as there is no with
section to automatically close each file.