I’ve been trying to replace some text in my output but i’ve had no luck.
I want the output to look something like this
JavaScript
x
19
19
1
su-n-s-e-t
2
https://64.media.tumblr.com/35fb46ace19cf31bf16c3655eff26fa6/bc3cfd4a41299b1e-9a/s500x750/98bbd40e71066761a4bd5983896932dd56c94427.jpg
3
houndsofvalinor-art
4
https://64.media.tumblr.com/e89e2a223d965a0351e310c829389583/ce52b6a3e76c58be-6e/s500x750/43548527d68eac8def536a88901a2ff78355ef51.jpg
5
amazinglybeautifulphotography
6
https://64.media.tumblr.com/a7d31eb63666d39d10868debbab9e27c/5be73c7f5dadb3dd-aa/s500x750/5b10d9cc0400e7b0dbabb9ea14c37e6b91e85e91.jpg
7
kylebonallo
8
https://64.media.tumblr.com/b406c3ceb50e4e09e550710b35de1310/dddef868163205f7-71/s500x750/9fec23368ed8ca5d6effae89fbdcda54554d0a68.jpg
9
expressions-of-nature
10
https://64.media.tumblr.com/e1eb3612511e21177dfa66ac02f07b98/c5b5c1fc2cbbc58d-e1/s500x750/550dbdf7568167891c5ea1af18af9cbc91cd620f.jpg
11
ex0skeletal-undead
12
https://64.media.tumblr.com/14d837eb6159b8376443393d8b1ef551/fb5d595667e75d0f-79/s500x750/cee48e58e0b1191376e20fd11904c09adbea50b3.jpg
13
geopsych
14
https://64.media.tumblr.com/a596b92db62c8ae4f68b490d172f8227/c856f013961ced0e-10/s500x750/73fb838c8065174e5ede5d93698ea386e6df1efe.jpg
15
jacobvanloon
16
https://64.media.tumblr.com/ca5f1e13bb4642de55422e74611f1df6/6f85f80cb48e73f7-e4/s500x750/12b59223056baf7733d99f210f1cd8bc397d52cd.png
17
amazinglybeautifulphotography
18
https://64.media.tumblr.com/06a1ff4abc50e80df59ddbd6e9c8c42c/3fd49bbbfb9dffd8-df/s500x750/43d3adf64f6fec58ebd37633be4988f36746e819.jpg
19
The url_list
variable returns:
JavaScript
1
7
1
geopsych
2
[' https://64.media.tumblr.com/a596b92db62c8ae4f68b490d172f8227/c856f013961ced0e-10/s500x750/73fb838c8065174e5ede5d93698ea386e6df1efe.jpg 500w']
3
burningmine
4
[' https://64.media.tumblr.com/e32b99ad1de8f8cd494205982c0137a1/54985812c55123d3-99/s500x750/cbe83b505eb14ff36e2be05e171a30bfd073a41b.jpg 500w']
5
amazinglybeautifulphotography
6
[' https://64.media.tumblr.com/06a1ff4abc50e80df59ddbd6e9c8c42c/3fd49bbbfb9dffd8-df/s500x750/43d3adf64f6fec58ebd37633be4988f36746e819.jpg 500w']
7
This is what I have tried:
JavaScript
1
3
1
for results in urls:
2
results.replace('500w','')
3
but I still get it with 500w
on the end.
And since I want to get every link in a single line without the ['']
I tried to split it with .split('n')
, instead of .split(',')
but it seems i get an error too when I use that.
Here is the rest of the code:
JavaScript
1
23
23
1
import requests
2
from bs4 import BeautifulSoup
3
4
search_term = 'landscape'
5
posts_scrape = requests.get(f'https://www.tumblr.com/search/{search_term}')
6
soup = BeautifulSoup(posts_scrape.text, 'html.parser')
7
8
articles = soup.find_all('article', class_='_2DpMA')
9
10
for article in articles:
11
try:
12
source = article.find('div', class_='_3QBiZ').text
13
urls = []
14
for imgvar in article.find_all('img', alt='Image'):
15
url_list = [i for i in imgvar['srcset'].split(',') if (i.find('500w') != -1)]
16
urls.append(url_list)
17
for results in urls:
18
results.replace('500w','')
19
print (source)
20
print (results)
21
except AttributeError:
22
continue
23
Advertisement
Answer
I recommend to use dictionary to store image URLS. The keys are source of the image and values are lists of image URLs. For example:
JavaScript
1
29
29
1
import requests
2
from bs4 import BeautifulSoup
3
4
search_term = "landscape"
5
posts_scrape = requests.get(f"https://www.tumblr.com/search/{search_term}")
6
soup = BeautifulSoup(posts_scrape.text, "html.parser")
7
8
articles = soup.find_all("article", class_="_2DpMA")
9
10
data = {}
11
for article in articles:
12
try:
13
source = article.find("div", class_="_3QBiZ").text
14
for imgvar in article.find_all("img", alt="Image"):
15
data.setdefault(source, []).extend(
16
[
17
i.replace("500w", "").strip()
18
for i in imgvar["srcset"].split(",")
19
if "500w" in i
20
]
21
)
22
except AttributeError:
23
continue
24
25
for source, image_urls in data.items():
26
for url in image_urls:
27
print(source)
28
print(url)
29
Prints:
JavaScript
1
13
13
1
leahberman
2
https://64.media.tumblr.com/e29c3dd39ab0e413ff6eefa0cfc973de/d6817667d3007f74-09/s500x750/2971fc9af6619f1f783bb169b104dea023f339de.gifv
3
leahberman
4
https://64.media.tumblr.com/8c61e084290ccea6fef3eab1d96204fd/d6817667d3007f74-b8/s500x750/45873681924618d179bfc97e04a02d3d6ebaac39.gifv
5
leahberman
6
https://64.media.tumblr.com/c4db8bc21289aec008219f5a4b307714/d6817667d3007f74-85/s500x750/c49d38c369ccb507d950b116e637886ac4467685.gifv
7
poetry-siir
8
https://64.media.tumblr.com/5495c24e4608688a6a0052d81da01882/d97a76eeb3edd5e9-d7/s500x750/9862a63fe430e83850ceb73f384bf2af6322db5e.jpg
9
poetry-siir
10
https://64.media.tumblr.com/9944d7a5d2d26a57118c8b391b699efb/d97a76eeb3edd5e9-ad/s500x750/7cfc69a18143d5b2a678fe0c85c431e5387a2107.jpg
11
12
and so on.
13