Skip to content
Advertisement

Python / (no regex) How to make a list from words of a string after removing numbers and punctiations or how can I make a new string after removing?

I need to make a list from a string and I need to remove the numbers and/or punctiations before adding them to the list. I have written something but it adds only number words as ” ” spaces. So I have added an “elif” statement to pass all number words but somehow I am uneasy. I can’t use regex so is there another aproach or how can I improve this code? Appriciate for help. Here is my codes:

    my_text = " This3 is a sentence6 that1 I have written 44 to see 5 if this code works so that it makes a list of words from using this code"

new_text = my_text.split()
cleared_text = []

for word in new_text:
    if word.isalpha():
        cleared_text.append(word)
    elif word.isnumeric():
        pass
    else:
        containeer = ""
        for letter in word:
            if letter.isalpha():
                containeer += letter
        cleared_text.append(containeer)

print(cleared_text)

My main target is to store words from the string in a dictionary then show how many times a word repeated so is it better to do do this task from list to dictionary instead of list to list then to dictionary.

Thank you all, my regards

Advertisement

Answer

Here is how I would do it:

my_text = " This3 is a sentence6 that1 I have written 44 to see 5 if this code works so that it makes a list of words from using this code"

new_text = my_text.split()
cleared_text = []

for word in new_text:
    new_word = "".join([char for char in word if char.isalpha()])
    cleared_text += [new_word] if new_word != "" else []
print(cleared_text)

We can also use filter() to remove empty strings:

for word in new_text:
    new_word = "".join([char for char in word if char.isalpha()])
    cleared_text.append(new_word)
cleared_text = list(filter(None, cleared_text))
print(cleared_text)

Output:

['This', 'is', 'a', 'sentence', 'that', 'I', 'have', 'written', 'to', 'see', 'if', 'this', 'code', 'works', 'so', 'that', 'it', 'makes', 'a', 'list', 'of', 'words', 'from', 'using', 'this', 'code']

We can then count the number of occurrences of each word using list comprehension and count():

occurences = dict((word, cleared_text.count(word)) for word in set(cleared_text))
print(occurences)

Output:

{'I': 1, 'of': 1, 'makes': 1, 'is': 1, 'so': 1, 'sentence': 1, 'a': 2, 'this': 2, 'works': 1, 'it': 1, 'words': 1, 'code': 2, 'list': 1, 'if': 1, 'to': 1, 'This': 1, 'see': 1, 'written': 1, 'have': 1, 'from': 1, 'that': 2, 'using': 1}
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement