Skip to content
Advertisement

Output of split is not what I was expecting

I’m just learning python, and I’m having some problems reading a .txt file that I created. My objective: I have a txt file with a list of strings. I’m trying to read, process it and save every letter into a new list.


example2.txt file: [one, two, THREE, one, two, ten, eight,cat, dog, bird, fish] [Alonso, Alicia, Bob, Lynn] , [red, blue, green, pink, cyan]


My output [‘one, two, THREE, one, two, ten, eight, cat, dog, bird, fish]n’] [‘Alonso, Alicia, Bob, Lynn], [red, blue, green, pink, cyan’]

What I was expecting was something like this: ['one','two','THREE','one','two','ten','eight','cat','dog','bird','fish','Alonso','Alicia','Bob','Lynn','red','blue','green','pink','cyan']

My code in python This is what I tried; you can ignore the comments

import re
# Creating a variable to store later the contents of the file
list_String = []
# Reading the file
file = open("D:direxample2", "r")

for line in file:
    print(re.split('^[s].', line.strip(' ][')))
    #list_String.append(line.strip('[]').strip("n").split(","))
    #list_String = re.split(r'[^St.]', line)
    #print(line.split(r"S"))
    #print(line)

#print(list_String)

file.close()

I also was reading the documentation on how to use re, but I don’t know if it is just me or is hard to understand.

I tried experimenting with what I read, but I’m still not getting what I wanted.

I even try this:

print(line.strip('][').strip('n').strip(']').split(","))

Output

['one', ' two', ' THREE', ' one', ' two', ' ten', ' eight', 'cat', ' dog', ' bird', ' fish']
['Alonso', ' Alicia', ' Bob', ' Lynn] ', ' [red', ' blue', ' green', ' pink', ' cyan']

As you can see, it kind of works. However, between Lynn and red, the braces and the comma do not disappear somehow.

Thank you for the time and help

Advertisement

Answer

You might just find that doing an re.findall on the pattern w+ works here:

inp = "[one, two, THREE, one, two, ten, eight,cat, dog, bird, fish] [Alonso, Alicia, Bob, Lynn] , [red, blue, green, pink, cyan]"
words = re.findall(r'w+', inp)
print(words)

This prints:

['one', 'two', 'THREE', 'one', 'two', 'ten', 'eight', 'cat', 'dog', 'bird', 'fish',
 'Alonso', 'Alicia', 'Bob', 'Lynn', 'red', 'blue', 'green', 'pink', 'cyan']
Advertisement