Skip to content
Advertisement

How can I fix my python code about web scraper based on the beautifulsoup?

I’m a very beginner of Python.

I tried to make some web scraper (especially PubMed).

Using my code, I want to print the result which has not only the title of papers, but doi (or any accession links of the paper) like below.

Title: ABCD ABCD ABCD ABCD [http:// ~~~~]

Title: ABCD ABCD ABCD ABCD [http:// ~~~~]

Title: ABCD ABCD ABCD ABCD [http:// ~~~~]

….

But, in the final stage,

I can not show the title and links, simultaneously.

When I print each factors, respectively, it works.

Also, I don’t know exactly how to use ‘for’.

I really appreciate for your consideration of my question.

Thanks.

import requests
from bs4 import BeautifulSoup
from pprint import pprint

search = str(input("Search: "))
arttype = str(input("Is ir Review ? (y/n): "))
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
sort = str(input("Which options do you want ? (date/match): "))

if arttype == "y":
    arttype_in = "&filter=pubt.review"
else:
    arttype_in = ""

if sort == "data":
    sort2 = "&sort=data"
else:
    sort2 = ""

url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
req = requests.get(url)
html = req.text
status = req.status_code


if status != 200:
    print ("")
else:
    print ("Stuck")
    

soup = BeautifulSoup(html, "html.parser")

contain_amount = soup.find ("div", {"class":"search-results"})
specific_amount = contain_amount.find ("div", {"class":"results-amount"}).text

print("Number of papers: " + str(specific_amount))

list_titles = soup.find_all ("div", {"class":"short-view"})
list_dois = soup.find_all ("a", {"class":"link-item dialog-focus"})


for i in list_dois:
    for j in list_titles:
        titles = j.find ("h1", {"class":"heading-title"}).text
        print ("Title: " + str(titles))
    dois = i.attrs["href"]
    print ("[" + str(dois) + "]")

Advertisement

Answer

Change the selectors. Half of your code is correct

import requests
from bs4 import BeautifulSoup
from pprint import pprint

search = str(input("Search: "))
arttype = str(input("Is ir Review ? (y/n): "))
perpage = str(input("How many results do you want ? (10/20/50/100/200): "))
sort = str(input("Which options do you want ? (date/match): "))

if arttype == "y":
    arttype_in = "&filter=pubt.review"
else:
    arttype_in = ""

if sort == "data":
    sort2 = "&sort=data"
else:
    sort2 = ""

url = "https://pubmed.ncbi.nlm.nih.gov/?term=" + search + arttype_in + "&format=abstract" + sort2 + "&size=" + perpage
print(url)
req = requests.get(url)
html = req.text
status = req.status_code


if status != 200:
    print ("Stuck")
    

soup = BeautifulSoup(html, "html.parser")

search_divs = soup.find_all("div", class_="results-article")

for div in search_divs:
    print("Title - {}".format(div.find("h1", class_="heading-title").get_text(strip=True)))
    print("Link - {}".format("https://pubmed.ncbi.nlm.nih.gov" + div.find("a")["href"]))
    print("---" * 25)

print("Number of papers - {}".format(soup.find("div", class_="results-amount").get_text(strip=True)))

Output:

Search: corona
Is ir Review ? (y/n): n
How many results do you want ? (10/20/50/100/200): 20
Which options do you want ? (date/match): match
https://pubmed.ncbi.nlm.nih.gov/?term=corona&format=abstract&size=20
Title - The history and epidemiology of Middle East respiratory syndrome corona virus
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Multidiscip+Respir+Med%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Personalized protein corona on nanoparticles and its clinical implications
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biomater+Sci%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Nanoparticle-Protein Interaction: The Significance and Role of Protein Corona
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Exp+Med+Biol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Gold nanoparticle should understand protein corona for being a clinical nanomaterial
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Control+Release%22%5Bjour%5D
---------------------------------------------------------------------------
Title - The impact of protein corona on the behavior and targeting capability of nanoparticle-based delivery system
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Pharm%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Liposome protein corona characterization as a new approach in nanomedicine
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Anal+Bioanal+Chem%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Shell-corona microgels from double interpenetrating networks
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Soft+Matter%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Protein corona: Opportunities and challenges
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Biomolecular Corona Dictates Aβ Fibrillation Process
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22ACS+Chem+Neurosci%22%5Bjour%5D
---------------------------------------------------------------------------
Title - A health concern regarding the protein corona, aggregation and disaggregation
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Biochim+Biophys+Acta+Gen+Subj%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Formation and Characterization of Protein Corona Around Nanoparticles: A Review
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanosci+Nanotechnol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Silver nanoparticle protein corona and toxicity: a mini-review
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Nanobiotechnology%22%5Bjour%5D
---------------------------------------------------------------------------
Title - The prevalence and morphology of the corona mortis (Crown of death): A meta-analysis with implications in abdominal wall and pelvic surgery
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Injury%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Possibilities and Limitations of Different Separation Techniques for the Analysis of the Protein Corona
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Angew+Chem+Int+Ed+Engl%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Translating Current Bioanalytical Techniques for Studying Corona Activity
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Trends+Biotechnol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - The Crown and the Scepter: Roles of the Protein Corona in Nanomedicine
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Adv+Mater%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Protein corona - from molecular adsorption to physiological complexity
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Beilstein+J+Nanotechnol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Understanding the nanoparticle-protein corona complexes using computational and experimental methods
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22Int+J+Biochem+Cell+Biol%22%5Bjour%5D
---------------------------------------------------------------------------
Title - Structure of corona radiata and tapetum fibers in ventricular surgery
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Clin+Neurosci%22%5Bjour%5D
---------------------------------------------------------------------------
Title - A protein corona primer for physical chemists
Link - https://pubmed.ncbi.nlm.nih.gov/?term=%22J+Chem+Phys%22%5Bjour%5D
---------------------------------------------------------------------------

Number of papers - 954results
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement