python web scraping issues with mechanize

Question

I am trying to scrape web results from the website: https://promedmail.org/promed-posts/ I have followed beutifulsoup. mechanical soup and mechanize so far unable to scrape the search results. The content does not show the search results when typed in US. Any idea on what am I doing wrong here? Answer As you …

Accepted Answer

As you mention bs4 you can mimic the POST request the page makes. Extract the json item which contains the html the page would have been updated with (containing the results); parse that into BeautifulSoup object then reconstruct the results table as a dataframe:import requestsfrom bs4 import BeautifulSoup as bsheaders = {'user-agent': 'Mozilla/5.0'}data = {  'action': 'get_promed_search_content',  'query[0][name]': 'kwby1',  'query[0][value]': 'summary',  'query[1][name]': 'search',  'query[1][value]': 'US',  'query[2][name]': 'date1',#  'query[2][value]': '',  'query[3][name]': 'date2',#  'query[3][value]': '',  'query[4][name]': 'feed_id',  'query[4][value]': '1'}r = requests.post('https://promedmail.org/wp-admin/admin-ajax.php', headers=headers, data=data).json()soup = bs(r['results'], 'lxml')df = pd.DataFrame([(i.find_next(text=True),                     i.a.text,                     f"https://promedmail.org/promed-post/?id={i.a['id'].replace('id','')}") for i in soup.select('li')]                  , columns = ['Date', 'Title', 'Link'])print(df)

Advertisement

Answer