Skip to content
Advertisement

How to create a new list of all the words separating by each word and each line?

I have a text file as :

sample.txt

Hi I am student
I am from 

What I’ve tried is

import string
import re

def read_to_list1(filepath):
    text_as_string = open(filepath, 'r').read()
    x = re.sub('['+string.punctuation+']', '', text_as_string).split("n")
    
    for i in x:
        x_as_string = re.sub('['+string.punctuation+']', '', i).split()
        print(x_as_string)

read_to_list1('sample.txt')

This results

['Hi,'I','am','student']
['I','am','from']

I want the result as:

[['Hi,'I','am','student'],['I','am','from']]

Advertisement

Answer

After opening the file, you can use a list comprehension to iterate over lines, and for each line str.split on whitespace to get tokens for each sublist.

def read_to_list1(filepath):
    with open(filepath, 'r') as f_in:
        return [line.split() for line in f_in]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement