I scraped tweet statuses, from which I'm removing certain words; however, it doesn't work effectively as it only removes the first string in "stopwords". Code: Output: I tried different codes after researching, but to no avail. :/ Answer You just need to de-indent the last line there: In its original position, it would append the link with /people removed but

Remove certain words from URL

I scraped tweet statuses, from which I’m removing certain words; however, it doesn’t work effectively as it only removes the first string in “stopwords”.

Code:

stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
    for i in stopwords:
        remove = link.replace(i, "")
        link = remove
        link_list.append(link)

JavaScript
​x
 
stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
    for i in stopwords:
        remove = link.replace(i, "")
        link = remove
        link_list.append(link)
​

Output:

https://twitter.com/CultOfCurtis/status/1492292326051483648
Good you're beginning to learn, now let's discuss your DVD collection.
— A.D. (@DadOutOfStyle) February 12, 2022

Time to mount a nuke for Russia.. before Mars ;-)
— Andre 🕳 (@AndreWillemse4) February 12, 2022

This is fucked up @elonmusk https://t.co/1N6Y4So631
— Jaimee Jakobczak (@JaimeeJakobczak) February 12, 2022

https://twitter.com/consequence/status/1492245783084773383/photo/1
15 out of 23 monkeys implanted with Elon Musk’s Neuralink brain chips have reportedly died: https://t.co/WrAW6BqU75 pic.twitter.com/oh2giLblLT
— CONSEQUENCE (@consequence) February 11, 2022

Martians will be scared of this guy
— EVStyle (@EVStyle2) February 12, 2022

For people looking to get into crypto for the first time this is the best place to start if your in AUS. Bottom confirmed now is the time to load up! Great platform, low fees! Sign up today for 10$ free BITCOIN https://t.co/HDHObXqJ4w
— GOOD VIBRATIONS (@SammyMorgan) February 12, 2022

https://twitter.com/gayesian/status/1492292246456184841
LOLLL
— Kyle Jordan Maxwell (@kylejmax) February 12, 2022

Elon Musk revela detalles de Starship, la nave de SpaceX para llegar a la Luna https://t.co/DZRNg2C6np
— Mauro Sosa (@Mauro_Sosa_S) February 12, 2022

JavaScript
 
https://twitter.com/CultOfCurtis/status/1492292326051483648
Good you're beginning to learn, now let's discuss your DVD collection.— A.D. (@DadOutOfStyle) February 12, 2022
Time to mount a nuke for Russia.. before Mars ;-)— Andre 🕳 (@AndreWillemse4) February 12, 2022
This is fucked up @elonmusk https://t.co/1N6Y4So631— Jaimee Jakobczak (@JaimeeJakobczak) February 12, 2022
https://twitter.com/consequence/status/1492245783084773383/photo/1
15 out of 23 monkeys implanted with Elon Musk’s Neuralink brain chips have reportedly died: https://t.co/WrAW6BqU75 pic.twitter.com/oh2giLblLT— CONSEQUENCE (@consequence) February 11, 2022
Martians will be scared of this guy— EVStyle (@EVStyle2) February 12, 2022
For people looking to get into crypto for the first time this is the best place to start if your in AUS. Bottom confirmed now is the time to load up! Great platform, low fees! Sign up today for 10$ free BITCOIN https://t.co/HDHObXqJ4w— GOOD VIBRATIONS (@SammyMorgan) February 12, 2022
https://twitter.com/gayesian/status/1492292246456184841
LOLLL— Kyle Jordan Maxwell (@kylejmax) February 12, 2022
Elon Musk revela detalles de Starship, la nave de SpaceX para llegar a la Luna https://t.co/DZRNg2C6np— Mauro Sosa (@Mauro_Sosa_S) February 12, 2022
​

I tried different codes after researching, but to no avail. :/

Answer

You just need to de-indent the last line there:

stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
    for i in stopwords:
        remove = link.replace(i, "")
        link = remove
    link_list.append(link)

JavaScript
 
stopwords = ['/people', '/photo/1']
link_list = []
for link in links:
    for i in stopwords:
        remove = link.replace(i, "")
        link = remove
    link_list.append(link) 
​

In its original position, it would append the link with /people removed but before removing /photo/1. Then it would append again with /photo/1 removed.

You could alternatively apply this suggestion here and use a compiled regular expression:

import re

stopwords = ['/people', '/photo/1']
pattern = re.compile('|'.join(map(re.escape, stopwords)))
link_list = [pattern.sub('', link) for link in links]

JavaScript
 
import re
​
stopwords = ['/people', '/photo/1']
pattern = re.compile('|'.join(map(re.escape, stopwords)))
link_list = [pattern.sub('', link) for link in links]
​

Advertisement

Answer