I scraped tweet statuses, from which I’m removing certain words; however, it doesn’t work effectively as it only removes the first string in “stopwords”.
Code:
JavaScript
x
8
1
stopwords = ['/people', '/photo/1']
2
link_list = []
3
for link in links:
4
for i in stopwords:
5
remove = link.replace(i, "")
6
link = remove
7
link_list.append(link)
8
Output:
JavaScript
1
12
12
1
https://twitter.com/CultOfCurtis/status/1492292326051483648
2
Good you're beginning to learn, now let's discuss your DVD collection.— A.D. (@DadOutOfStyle) February 12, 2022
3
Time to mount a nuke for Russia.. before Mars ;-)— Andre 🕳 (@AndreWillemse4) February 12, 2022
4
This is fucked up @elonmusk https://t.co/1N6Y4So631— Jaimee Jakobczak (@JaimeeJakobczak) February 12, 2022
5
https://twitter.com/consequence/status/1492245783084773383/photo/1
6
15 out of 23 monkeys implanted with Elon Musk’s Neuralink brain chips have reportedly died: https://t.co/WrAW6BqU75 pic.twitter.com/oh2giLblLT— CONSEQUENCE (@consequence) February 11, 2022
7
Martians will be scared of this guy— EVStyle (@EVStyle2) February 12, 2022
8
For people looking to get into crypto for the first time this is the best place to start if your in AUS. Bottom confirmed now is the time to load up! Great platform, low fees! Sign up today for 10$ free BITCOIN https://t.co/HDHObXqJ4w— GOOD VIBRATIONS (@SammyMorgan) February 12, 2022
9
https://twitter.com/gayesian/status/1492292246456184841
10
LOLLL— Kyle Jordan Maxwell (@kylejmax) February 12, 2022
11
Elon Musk revela detalles de Starship, la nave de SpaceX para llegar a la Luna https://t.co/DZRNg2C6np— Mauro Sosa (@Mauro_Sosa_S) February 12, 2022
12
I tried different codes after researching, but to no avail. :/
Advertisement
Answer
You just need to de-indent the last line there:
JavaScript
1
8
1
stopwords = ['/people', '/photo/1']
2
link_list = []
3
for link in links:
4
for i in stopwords:
5
remove = link.replace(i, "")
6
link = remove
7
link_list.append(link)
8
In its original position, it would append the link with /people
removed but before removing /photo/1
. Then it would append again with /photo/1
removed.
You could alternatively apply this suggestion here and use a compiled regular expression:
JavaScript
1
6
1
import re
2
3
stopwords = ['/people', '/photo/1']
4
pattern = re.compile('|'.join(map(re.escape, stopwords)))
5
link_list = [pattern.sub('', link) for link in links]
6