I’m a very beginner of Python.
I tried to make some web scraper (especially PubMed).
Using my code, I want to print the result which has not only the title of papers, but doi (or any accession links of the paper) like below.
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
….
But, in the final stage,
I can not show the title and links, simultaneously.
When I print each factors, respectively, it works.
Also, I don’t know exactly how to use ‘for’.
I really appreciate for your consideration of my question.
Thanks.
JavaScript
x
49
49
1
import requests
2
from bs4 import BeautifulSoup
3
from pprint import pprint
4
5
search = str(input("Search: "))
6
arttype = str(input("Is ir Review ? (y/n): "))
7
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
8
sort = str(input("Which options do you want ? (date/match): "))
9
10
if arttype == "y":
11
arttype_in = "&filter=pubt.review"
12
else:
13
arttype_in = ""
14
15
if sort == "data":
16
sort2 = "&sort=data"
17
else:
18
sort2 = ""
19
20
url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
21
req = requests.get(url)
22
html = req.text
23
status = req.status_code
24
25
26
if status != 200:
27
print ("")
28
else:
29
print ("Stuck")
30
31
32
soup = BeautifulSoup(html, "html.parser")
33
34
contain_amount = soup.find ("div", {"class":"search-results"})
35
specific_amount = contain_amount.find ("div", {"class":"results-amount"}).text
36
37
print("Number of papers: " + str(specific_amount))
38
39
list_titles = soup.find_all ("div", {"class":"short-view"})
40
list_dois = soup.find_all ("a", {"class":"link-item dialog-focus"})
41
42
43
for i in list_dois:
44
for j in list_titles:
45
titles = j.find ("h1", {"class":"heading-title"}).text
46
print ("Title: " + str(titles))
47
dois = i.attrs["href"]
48
print ("[" + str(dois) + "]")
49
Advertisement
Answer
Change the selectors. Half of your code is correct
JavaScript
1
41
41
1
import requests
2
from bs4 import BeautifulSoup
3
from pprint import pprint
4
5
search = str(input("Search: "))
6
arttype = str(input("Is ir Review ? (y/n): "))
7
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
8
sort = str(input("Which options do you want ? (date/match): "))
9
10
if arttype == "y":
11
arttype_in = "&filter=pubt.review"
12
else:
13
arttype_in = ""
14
15
if sort == "data":
16
sort2 = "&sort=data"
17
else:
18
sort2 = ""
19
20
url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
21
print(url)
22
req = requests.get(url)
23
html = req.text
24
status = req.status_code
25
26
27
if status != 200:
28
print ("Stuck")
29
30
31
soup = BeautifulSoup(html, "html.parser")
32
33
search_divs = soup.find_all("div", class_="results-article")
34
35
for div in search_divs:
36
print("Title - {}".format(div.find("h1", class_="heading-title").get_text(strip=True)))
37
print("Link - {}".format("https://pubmed.ncbi.nlm.nih.gov" + div.find("a")["href"]))
38
print("---" * 25)
39
40
print("Number of papers - {}".format(soup.find("div", class_="results-amount").get_text(strip=True)))
41
Output:
JavaScript
1
68
68
1
Search: corona
2
Is ir Review ? (y/n): n
3
How many results do you want ? (10/20/50/100/200): 20
4
Which options do you want ? (date/match): match
5
https://pubmed.ncbi.nlm.nih.gov/?term=corona&format=abstract&size=20
6
Title - The history and epidemiology of Middle East respiratory syndrome corona virus
7
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Multidiscip+Respir+Med%22%5Bjour%5D
8
---------------------------------------------------------------------------
9
Title - Personalized protein corona on nanoparticles and its clinical implications
10
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biomater+Sci%22%5Bjour%5D
11
---------------------------------------------------------------------------
12
Title - Nanoparticle-Protein Interaction: The Significance and Role of Protein Corona
13
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Exp+Med+Biol%22%5Bjour%5D
14
---------------------------------------------------------------------------
15
Title - Gold nanoparticle should understand protein corona for being a clinical nanomaterial
16
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Control+Release%22%5Bjour%5D
17
---------------------------------------------------------------------------
18
Title - The impact of protein corona on the behavior and targeting capability of nanoparticle-based delivery system
19
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Pharm%22%5Bjour%5D
20
---------------------------------------------------------------------------
21
Title - Liposome protein corona characterization as a new approach in nanomedicine
22
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Anal+Bioanal+Chem%22%5Bjour%5D
23
---------------------------------------------------------------------------
24
Title - Shell-corona microgels from double interpenetrating networks
25
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Soft+Matter%22%5Bjour%5D
26
---------------------------------------------------------------------------
27
Title - Protein corona: Opportunities and challenges
28
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D
29
---------------------------------------------------------------------------
30
Title - Biomolecular Corona Dictates Aβ Fibrillation Process
31
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22ACS+Chem+Neurosci%22%5Bjour%5D
32
---------------------------------------------------------------------------
33
Title - A health concern regarding the protein corona, aggregation and disaggregation
34
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biochim+Biophys+Acta+Gen+Subj%22%5Bjour%5D
35
---------------------------------------------------------------------------
36
Title - Formation and Characterization of Protein Corona Around Nanoparticles: A Review
37
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanosci+Nanotechnol%22%5Bjour%5D
38
---------------------------------------------------------------------------
39
Title - Silver nanoparticle protein corona and toxicity: a mini-review
40
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanobiotechnology%22%5Bjour%5D
41
---------------------------------------------------------------------------
42
Title - The prevalence and morphology of the corona mortis (Crown of death): A meta-analysis with implications in abdominal wall and pelvic surgery
43
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Injury%22%5Bjour%5D
44
---------------------------------------------------------------------------
45
Title - Possibilities and Limitations of Different Separation Techniques for the Analysis of the Protein Corona
46
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Angew+Chem+Int+Ed+Engl%22%5Bjour%5D
47
---------------------------------------------------------------------------
48
Title - Translating Current Bioanalytical Techniques for Studying Corona Activity
49
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Trends+Biotechnol%22%5Bjour%5D
50
---------------------------------------------------------------------------
51
Title - The Crown and the Scepter: Roles of the Protein Corona in Nanomedicine
52
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Mater%22%5Bjour%5D
53
---------------------------------------------------------------------------
54
Title - Protein corona - from molecular adsorption to physiological complexity
55
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Beilstein+J+Nanotechnol%22%5Bjour%5D
56
---------------------------------------------------------------------------
57
Title - Understanding the nanoparticle-protein corona complexes using computational and experimental methods
58
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D
59
---------------------------------------------------------------------------
60
Title - Structure of corona radiata and tapetum fibers in ventricular surgery
61
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Clin+Neurosci%22%5Bjour%5D
62
---------------------------------------------------------------------------
63
Title - A protein corona primer for physical chemists
64
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Chem+Phys%22%5Bjour%5D
65
---------------------------------------------------------------------------
66
67
Number of papers - 954results
68