Hello everybody out there! I have been working with BeautifulSoup for my scraping projects. Currently, I’m learning Scrapy. I have written a code in BeautifulSoup to loop over multiple pages of a single website using for loops. I looped over 10 pages and fetched URLs of blog posts from those pages using the code below. I want to do the same thing in Scrapy but can’t figure out how. Can the same approach (code) be used with scrapy to do the same thing? Here is the BeautifulSoup code:
URL = 'https://www.brookings.edu/topic/environment/page/' lis=[] for page in range(1,10): req = requests.get(URL + str(page) + '/?type=posts') soup = BeautifulSoup(req.text,'lxml') links = [link['href'] for link in soup.find_all('a', href=re.compile('^(https://www.brookings.edu/blog/)'))] links=list(set(links)) lis.append(links)
This piece of code fetched the links from 10 pages of the website. I stored these links (blog posts links) in the list named li outside the for loop. Then with another for loop on that finalList I wrote my code to extract the text from blog posts.
Advertisement
Answer
import scrapy class BrSpider(scrapy.Spider): name = 'br' allowed_domains = ['brookings.edu'] def start_requests(self): for page in range(1, 11): yield scrapy.Request(f'https://www.brookings.edu/topic/environment/page/{page}/?type=posts') def parse(self, response): for i in response.css('.title a::attr(href)'): yield { 'Link': i.get() }
Output:
[ {"Link": "https://www.brookings.edu/blog/up-front/2018/12/10/3-big-societal-problems-to-fix-in-2019/"}, {"Link": "https://www.brookings.edu/blog/brookings-now/2018/10/31/highlights-an-energy-industry-view-on-moving-toward-a-lower-carbon-future/"}, {"Link": "https://www.brookings.edu/blog/brown-center-chalkboard/2018/10/30/climate-confusion-content-and-strategies-not-controversy-are-the-biggest-challenges-for-science-teachers/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2018/10/25/the-economics-and-politics-of-carbon-pricing/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2018/10/11/climate-reality-requires-starting-at-home-weaning-from-fossil-fuels/"}, {"Link": "https://www.brookings.edu/blog/techtank/2018/10/05/sharing-digitized-dna-sequences-must-balance-scientific-progress-with-fair-use/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2018/07/21/africa-in-the-news-eac-trade-statistics-nelson-mandela-day-and-conservation-updates/"}, {"Link": "https://www.brookings.edu/blog/future-development/2018/07/06/the-sustainable-development-goals-and-climate-finance-catalytic-agent-or-empty-vessel/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2018/06/14/enhancing-the-attractiveness-of-private-investment-in-hydropower-in-africa/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2018/06/01/trump-tried-to-kill-the-paris-agreement-but-the-effect-has-been-the-opposite/"}, {"Link": "https://www.brookings.edu/blog/up-front/2021/02/23/transforming-natural-resource-governance-break-silos-sharpen-politics/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/02/09/its-critical-that-we-invest-in-better-global-weather-and-climate-observations/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/02/04/secular-stagnation-climate-action-and-the-natural-rate-of-interest/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/01/27/figures-of-the-week-carbon-taxes-can-fuel-green-economic-recovery-and-reduce-income-inequality/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/01/25/to-support-climate-action-growth-measures-should-count-planetary-damages/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2021/01/25/the-national-security-imperative-to-tackle-illegal-unreported-and-unregulated-fishing/"}, {"Link": "https://www.brookings.edu/blog/up-front/2021/01/15/time-to-pivot-the-role-of-the-energy-transition-and-investors-in-forging-resilient-resource-rich-country-outcomes/"}, {"Link": "https://www.brookings.edu/blog/education-plus-development/2019/12/10/national-climate-strategies-are-forgetting-about-girls-children-and-youth/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/05/15/africa-in-the-news-wildlife-horn-of-africa-and-infrastructure-updates/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2021/05/10/barriers-to-achieving-us-climate-goals-are-more-political-than-technical/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/12/15/the-trump-administrations-major-environmental-deregulations/"}, {"Link": "https://www.brookings.edu/blog/up-front/2021/01/14/disrupting-the-waste-management-industry-through-technology-insights-from-rubicon/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2020/12/28/who-is-and-isnt-represented-in-environmental-oversight-in-congress/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/12/17/regulating-autonomous-vehicles-and-ridesharing-lessons-from-california/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2019/12/09/building-an-ambitious-us-climate-policy-from-the-bottom-up/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/12/02/top-emitters-must-commit-to-a-u-turn-at-cop25/"}, {"Link": "https://www.brookings.edu/blog/brown-center-chalkboard/2019/11/20/how-exposure-to-pollution-affects-educational-outcomes-and-inequality/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2019/10/03/a-conversation-with-guinean-president-conde-on-natural-resource-management-in-africa/"}, {"Link": "https://www.brookings.edu/blog/the-avenue/2019/09/24/how-a-scrappy-federal-it-program-can-be-a-model-for-us-climate-action/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2019/09/17/success-from-the-un-climate-summit-will-hinge-on-new-ways-to-build-national-action/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/09/16/the-invisible-water-crisis/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2021/05/10/republicans-in-congress-are-out-of-step-with-the-american-public-on-climate/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/05/05/figures-of-the-week-africas-renewable-energy-potential/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2021/04/26/will-cannabis-legalization-reduce-crime-in-mexico-has-it-in-the-us/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/04/10/africa-in-the-news-updates-on-natural-resources-and-politics-in-niger-djibouti-benin-and-chad/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/03/26/africas-green-bond-market-trails-behind-other-regions/"}, {"Link": "https://www.brookings.edu/blog/up-front/2021/03/04/understanding-and-mitigating-climate-change-risks/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/12/09/business-as-usual-is-not-an-option-the-future-of-natural-resource-governance/"}, {"Link": "https://www.brookings.edu/blog/future-development/2020/11/25/delhi-the-worlds-most-air-polluted-capital-fights-back/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2020/11/23/around-the-halls-what-should-the-biden-administration-prioritize-in-its-climate-policy/"}, {"Link": "https://www.brookings.edu/blog/future-development/2020/11/19/to-ride-covid-19s-green-wave-governments-must-slash-fossil-fuel-subsidies/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2020/11/12/figure-of-the-week-africas-used-vehicle-market-and-the-environment/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/08/23/for-growth-and-well-being-climate-crisis-overshadows-all-else/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2020/06/28/oil-gas-and-mining-corruption-is-it-inevitable/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/09/28/global-warming-fires-and-crime-in-mexico-and-beyond/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2019/09/16/the-fight-to-contain-climate-change-implementing-paris-mobilizing-action/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2019/09/13/campaign-2020-what-candidates-are-saying-on-climate-change/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2018/06/01/one-year-since-trumps-withdrawal-from-the-paris-climate-agreement/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/03/01/recipe-for-a-green-recovery-carbon-taxes/"}, {"Link": "https://www.brookings.edu/blog/up-front/2021/02/25/seizing-opportunities-for-fuel-subsidy-reform/"}, {"Link": "https://www.brookings.edu/blog/brown-center-chalkboard/2020/10/28/the-importance-of-clean-air-in-classrooms-during-the-pandemic-and-beyond/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/10/26/not-dried-up-us-mexico-water-cooperation/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/10/12/saving-the-vaquita-marina-and-the-urgency-of-this-fall/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/10/06/using-extractive-industries-data-for-better-governance/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/07/10/to-save-forests-think-beyond-the-trees/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2019/07/08/the-politics-of-methane/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2019/06/08/africa-in-the-news-new-environmental-policies-on-the-continent-zimbabwes-imf-stabilization-program-and-sudan-update/"}, {"Link": "https://www.brookings.edu/blog/up-front/2019/05/17/india-2024-a-green-india/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/04/25/the-critical-frontier-reducing-emissions-from-chinas-belt-and-road/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/04/24/new-data-on-governance-of-national-oil-companies-why-transparency-and-oversight-matter/"}, {"Link": "https://www.brookings.edu/blog/education-plus-development/2019/03/28/why-captain-planet-should-have-been-a-woman/"}, {"Link": "https://www.brookings.edu/blog/techtank/2021/11/16/how-technology-can-help-with-methane-regulation/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/06/16/reopening-the-world-to-prevent-zoogenic-pandemics-regulate-wildlife-trade-and-food-production/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/06/08/play-the-game-a-presidents-climate-quandary/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/05/11/wildlife-trade-in-mexico-conservation-and-pandemics/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/04/30/six-covid-related-deregulations-to-watch/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2020/04/23/covid-19-and-climate-your-questions-our-answers/"}, {"Link": "https://www.brookings.edu/blog/future-development/2020/04/22/global-solutions-to-global-bads-2-practical-proposals-to-help-developing-countries-deal-with-the-covid-19-pandemic/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/09/14/illegal-fishing-in-mexico-and-policy-responses/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2020/09/05/africa-in-the-news-mali-coup-mauritius-oil-spill-and-covid-19-updates/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2020/08/19/amid-covid-19-dont-ignore-the-links-between-poor-air-quality-and-public-health/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/08/07/uncommon-ground-the-impact-of-natural-resource-corruption-on-indigenous-peoples/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2020/08/05/the-controversy-over-the-grand-ethiopian-renaissance-dam/"}, {"Link": "https://www.brookings.edu/blog/techtank/2018/05/31/catastrophic-risk-to-ecosystems-puts-biotechnology-fixes-on-the-table/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2018/05/29/transition-to-electric-vehicles-in-karnataka-and-india-whats-real-possible-and-missing-in-the-ecosystem/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2018/05/16/young-republicans-diverge-on-climate-policy/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2018/05/10/figures-of-the-week-access-to-affordable-sustainable-and-modern-energy-in-africa/"}, {"Link": "https://www.brookings.edu/blog/social-mobility-memos/2018/04/21/earth-day-it-is-about-equity-as-well-as-the-environment/"}, {"Link": "https://www.brookings.edu/blog/brookings-now/2018/04/20/on-earth-day-5-facts-about-environmental-policy-and-research/"}, {"Link": "https://www.brookings.edu/blog/future-development/2019/01/28/the-deforestation-risks-of-chinas-belt-and-road-initiative/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2018/12/20/what-frances-yellow-vest-protests-reveal-about-the-future-of-climate-action/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2021/11/10/infrastructure-in-the-developing-world-is-a-planetary-furnace-heres-how-to-cool-it/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2021/10/25/net-zero-carbon-pledges-have-good-intentions-but-they-are-not-enough/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/09/28/the-risks-of-us-eu-divergence-on-corporate-sustainability-disclosure/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2021/07/26/a-porpoise-to-serve-rescuing-the-vaquita-and-the-us-mexico-relationship/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2021/07/23/addressing-africas-extreme-water-insecurity/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/07/07/transnational-governance-of-natural-resources-for-the-21st-century/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/04/21/how-to-reduce-emissions-as-much-as-possible-at-the-lowest-cost/"}, {"Link": "https://www.brookings.edu/blog/the-avenue/2020/04/14/weakening-environmental-reviews-for-transportation-infrastructure-is-a-bridge-too-far/"}, {"Link": "https://www.brookings.edu/blog/africa-in-focus/2020/02/18/why-ethiopia-egypt-and-sudan-should-ditch-a-rushed-washington-brokered-nile-treaty/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/07/30/the-evolution-of-the-eiti-and-next-steps-for-tackling-extractive-industries-corruption/"}, {"Link": "https://www.brookings.edu/blog/up-front/2020/07/23/a-master-class-in-corruption-the-luanda-leaks-across-the-natural-resource-value-chain/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2020/07/20/the-damage-trumps-wall-causes-in-mexico/"}, {"Link": "https://www.brookings.edu/blog/future-development/2020/07/10/what-the-pandemic-reveals-about-governance-state-capture-and-natural-resources/"}, {"Link": "https://www.brookings.edu/blog/up-front/2017/12/20/estimating-the-rising-cost-of-a-surprising-tax-shelter-the-syndicated-conservation-easement/"}, {"Link": "https://www.brookings.edu/blog/future-development/2021/07/02/protecting-forests-are-early-warning-systems-effective/"}, {"Link": "https://www.brookings.edu/blog/fixgov/2021/06/24/when-climate-policy-works-hfcs-and-the-case-of-short-lived-climate-pollutants/"}, {"Link": "https://www.brookings.edu/blog/brown-center-chalkboard/2021/05/19/now-is-the-time-to-invest-in-school-infrastructure/"}, {"Link": "https://www.brookings.edu/blog/planetpolicy/2017/12/08/fill-the-gaps-in-the-tax-bill-with-a-carbon-tax-and-expanded-benefits-for-working-families/"}, {"Link": "https://www.brookings.edu/blog/order-from-chaos/2017/11/27/on-the-vices-and-virtues-of-trophy-hunting/"} ]