I scraped tweet statuses, from which I’m removing certain words; however, it doesn’t work effectively as it only removes the first string in “stopwords”.
Code:
stopwords = ['/people', '/photo/1'] link_list = [] for link in links: for i in stopwords: remove = link.replace(i, "") link = remove link_list.append(link)
Output:
https://twitter.com/CultOfCurtis/status/1492292326051483648Good you're beginning to learn, now let's discuss your DVD collection.
— A.D. (@DadOutOfStyle) February 12, 2022Time to mount a nuke for Russia.. before Mars ;-)
— Andre 🕳 (@AndreWillemse4) February 12, 2022https://twitter.com/consequence/status/1492245783084773383/photo/1This is fucked up @elonmusk https://t.co/1N6Y4So631
— Jaimee Jakobczak (@JaimeeJakobczak) February 12, 202215 out of 23 monkeys implanted with Elon Musk’s Neuralink brain chips have reportedly died: https://t.co/WrAW6BqU75 pic.twitter.com/oh2giLblLT
— CONSEQUENCE (@consequence) February 11, 2022Martians will be scared of this guy
— EVStyle (@EVStyle2) February 12, 2022https://twitter.com/gayesian/status/1492292246456184841For people looking to get into crypto for the first time this is the best place to start if your in AUS. Bottom confirmed now is the time to load up! Great platform, low fees! Sign up today for 10$ free BITCOIN https://t.co/HDHObXqJ4w
— GOOD VIBRATIONS (@SammyMorgan) February 12, 2022LOLLL
— Kyle Jordan Maxwell (@kylejmax) February 12, 2022Elon Musk revela detalles de Starship, la nave de SpaceX para llegar a la Luna https://t.co/DZRNg2C6np
— Mauro Sosa (@Mauro_Sosa_S) February 12, 2022
I tried different codes after researching, but to no avail. :/
Advertisement
Answer
You just need to de-indent the last line there:
stopwords = ['/people', '/photo/1'] link_list = [] for link in links: for i in stopwords: remove = link.replace(i, "") link = remove link_list.append(link)
In its original position, it would append the link with /people
removed but before removing /photo/1
. Then it would append again with /photo/1
removed.
You could alternatively apply this suggestion here and use a compiled regular expression:
import re stopwords = ['/people', '/photo/1'] pattern = re.compile('|'.join(map(re.escape, stopwords))) link_list = [pattern.sub('', link) for link in links]