I’m a very beginner of Python.
I tried to make some web scraper (especially PubMed).
Using my code, I want to print the result which has not only the title of papers, but doi (or any accession links of the paper) like below.
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
Title: ABCD ABCD ABCD ABCD [http:// ~~~~]
….
But, in the final stage,
I can not show the title and links, simultaneously.
When I print each factors, respectively, it works.
Also, I don’t know exactly how to use ‘for’.
I really appreciate for your consideration of my question.
Thanks.
import requests
from bs4 import BeautifulSoup
from pprint import pprint
search = str(input("Search: "))
arttype = str(input("Is ir Review ? (y/n): "))
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
sort = str(input("Which options do you want ? (date/match): "))
if arttype == "y":
arttype_in = "&filter=pubt.review"
else:
arttype_in = ""
if sort == "data":
sort2 = "&sort=data"
else:
sort2 = ""
url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
req = requests.get(url)
html = req.text
status = req.status_code
if status != 200:
print ("")
else:
print ("Stuck")
soup = BeautifulSoup(html, "html.parser")
contain_amount = soup.find ("div", {"class":"search-results"})
specific_amount = contain_amount.find ("div", {"class":"results-amount"}).text
print("Number of papers: " + str(specific_amount))
list_titles = soup.find_all ("div", {"class":"short-view"})
list_dois = soup.find_all ("a", {"class":"link-item dialog-focus"})
for i in list_dois:
for j in list_titles:
titles = j.find ("h1", {"class":"heading-title"}).text
print ("Title: " + str(titles))
dois = i.attrs["href"]
print ("[" + str(dois) + "]")
Advertisement
Answer
Change the selectors. Half of your code is correct
import requests
from bs4 import BeautifulSoup
from pprint import pprint
search = str(input("Search: "))
arttype = str(input("Is ir Review ? (y/n): "))
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
sort = str(input("Which options do you want ? (date/match): "))
if arttype == "y":
arttype_in = "&filter=pubt.review"
else:
arttype_in = ""
if sort == "data":
sort2 = "&sort=data"
else:
sort2 = ""
url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
print(url)
req = requests.get(url)
html = req.text
status = req.status_code
if status != 200:
print ("Stuck")
soup = BeautifulSoup(html, "html.parser")
search_divs = soup.find_all("div", class_="results-article")
for div in search_divs:
print("Title - {}".format(div.find("h1", class_="heading-title").get_text(strip=True)))
print("Link - {}".format("https://pubmed.ncbi.nlm.nih.gov" + div.find("a")["href"]))
print("---" * 25)
print("Number of papers - {}".format(soup.find("div", class_="results-amount").get_text(strip=True)))
Output:
Search: corona Is ir Review ? (y/n): n How many results do you want ? (10/20/50/100/200): 20 Which options do you want ? (date/match): match https://pubmed.ncbi.nlm.nih.gov/?term=corona&format=abstract&size=20 Title - The history and epidemiology of Middle East respiratory syndrome corona virus Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Multidiscip+Respir+Med%22%5Bjour%5D --------------------------------------------------------------------------- Title - Personalized protein corona on nanoparticles and its clinical implications Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biomater+Sci%22%5Bjour%5D --------------------------------------------------------------------------- Title - Nanoparticle-Protein Interaction: The Significance and Role of Protein Corona Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Exp+Med+Biol%22%5Bjour%5D --------------------------------------------------------------------------- Title - Gold nanoparticle should understand protein corona for being a clinical nanomaterial Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Control+Release%22%5Bjour%5D --------------------------------------------------------------------------- Title - The impact of protein corona on the behavior and targeting capability of nanoparticle-based delivery system Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Pharm%22%5Bjour%5D --------------------------------------------------------------------------- Title - Liposome protein corona characterization as a new approach in nanomedicine Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Anal+Bioanal+Chem%22%5Bjour%5D --------------------------------------------------------------------------- Title - Shell-corona microgels from double interpenetrating networks Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Soft+Matter%22%5Bjour%5D --------------------------------------------------------------------------- Title - Protein corona: Opportunities and challenges Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D --------------------------------------------------------------------------- Title - Biomolecular Corona Dictates Aβ Fibrillation Process Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22ACS+Chem+Neurosci%22%5Bjour%5D --------------------------------------------------------------------------- Title - A health concern regarding the protein corona, aggregation and disaggregation Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biochim+Biophys+Acta+Gen+Subj%22%5Bjour%5D --------------------------------------------------------------------------- Title - Formation and Characterization of Protein Corona Around Nanoparticles: A Review Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanosci+Nanotechnol%22%5Bjour%5D --------------------------------------------------------------------------- Title - Silver nanoparticle protein corona and toxicity: a mini-review Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanobiotechnology%22%5Bjour%5D --------------------------------------------------------------------------- Title - The prevalence and morphology of the corona mortis (Crown of death): A meta-analysis with implications in abdominal wall and pelvic surgery Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Injury%22%5Bjour%5D --------------------------------------------------------------------------- Title - Possibilities and Limitations of Different Separation Techniques for the Analysis of the Protein Corona Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Angew+Chem+Int+Ed+Engl%22%5Bjour%5D --------------------------------------------------------------------------- Title - Translating Current Bioanalytical Techniques for Studying Corona Activity Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Trends+Biotechnol%22%5Bjour%5D --------------------------------------------------------------------------- Title - The Crown and the Scepter: Roles of the Protein Corona in Nanomedicine Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Mater%22%5Bjour%5D --------------------------------------------------------------------------- Title - Protein corona - from molecular adsorption to physiological complexity Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Beilstein+J+Nanotechnol%22%5Bjour%5D --------------------------------------------------------------------------- Title - Understanding the nanoparticle-protein corona complexes using computational and experimental methods Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D --------------------------------------------------------------------------- Title - Structure of corona radiata and tapetum fibers in ventricular surgery Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Clin+Neurosci%22%5Bjour%5D --------------------------------------------------------------------------- Title - A protein corona primer for physical chemists Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Chem+Phys%22%5Bjour%5D --------------------------------------------------------------------------- Number of papers - 954results