I need to make a list from a string and I need to remove the numbers and/or punctiations before adding them to the list. I have written something but it adds only number words as ” ” spaces. So I have added an “elif” statement to pass all number words but somehow I am uneasy. I can’t use regex so is there another aproach or how can I improve this code? Appriciate for help. Here is my codes:
my_text = " This3 is a sentence6 that1 I have written 44 to see 5 if this code works so that it makes a list of words from using this code" new_text = my_text.split() cleared_text = [] for word in new_text: if word.isalpha(): cleared_text.append(word) elif word.isnumeric(): pass else: containeer = "" for letter in word: if letter.isalpha(): containeer += letter cleared_text.append(containeer) print(cleared_text)
My main target is to store words from the string in a dictionary then show how many times a word repeated so is it better to do do this task from list to dictionary instead of list to list then to dictionary.
Thank you all, my regards
Advertisement
Answer
Here is how I would do it:
my_text = " This3 is a sentence6 that1 I have written 44 to see 5 if this code works so that it makes a list of words from using this code" new_text = my_text.split() cleared_text = [] for word in new_text: new_word = "".join([char for char in word if char.isalpha()]) cleared_text += [new_word] if new_word != "" else [] print(cleared_text)
We can also use filter() to remove empty strings:
for word in new_text: new_word = "".join([char for char in word if char.isalpha()]) cleared_text.append(new_word) cleared_text = list(filter(None, cleared_text)) print(cleared_text)
Output:
['This', 'is', 'a', 'sentence', 'that', 'I', 'have', 'written', 'to', 'see', 'if', 'this', 'code', 'works', 'so', 'that', 'it', 'makes', 'a', 'list', 'of', 'words', 'from', 'using', 'this', 'code']
We can then count the number of occurrences of each word using list comprehension and count()
:
occurences = dict((word, cleared_text.count(word)) for word in set(cleared_text)) print(occurences)
Output:
{'I': 1, 'of': 1, 'makes': 1, 'is': 1, 'so': 1, 'sentence': 1, 'a': 2, 'this': 2, 'works': 1, 'it': 1, 'words': 1, 'code': 2, 'list': 1, 'if': 1, 'to': 1, 'This': 1, 'see': 1, 'written': 1, 'have': 1, 'from': 1, 'that': 2, 'using': 1}