I am trying webscrape stock market caps, using this below code.
At first I traditionally tried to fetch the list of market cap values
using bs4.
When I used print(x.find('span',{'class': 'Trsdu(0.3s)'}).text)
to do this, I got AttributeError: 'NoneType' object has no attribute 'text'
error.
for x in marketCapArray: print(x.find('span',{'class': 'Trsdu(0.3s)'}).text)
I did not know how to resolve the above error specific to my code. So I took an alternative using regex to simply extract the required values and tried this below.
Main Code
import bs4 import re import requests from bs4 import BeautifulSoup from urllib.request import urlopen def pickTopGainers(): url = 'https://in.finance.yahoo.com/gainers?offset=0&count=100' page = urlopen(url) soup = bs4.BeautifulSoup(page,"html.parser") marketCapArray = soup.find_all('td', {'class': 'Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)', 'aria-label': 'Market cap'}) print(str(marketCapArray)) xi = re.findall("........</span>", str(marketCapArray)) # regex-use-1 pi = re.sub("(</span>|....>N/A|>|")","", str(xi)) print(pi) pickTopGainers()
Results
This is what print(str(marketCapArray)
would output. (pasted only some part)
[<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="93"><span class="Trsdu(0.3s)" data-reactid="94">159.404M</span></td>, <td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="119"><span class="Trsdu(0.3s)" data-reactid="120">533.97M</span></td>, <td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="145"><span data-reactid="146">N/A</span></td>, <td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="171"><span class="Trsdu(0.3s)" data-reactid="172">2.952B</span></td>, <td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="197"><span class="Trsdu(0.3s)" data-reactid="198">9.223B</span></td>, <td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="223"><span data-reactid="224">N/A</span></td>]
This is the output of print(pi)
. Also the final output.
['159.404M', '533.97M', '', '2.952B', '9.223B', '']
Question
How can I avoid using regex replace(re.sub) in the above Main Code
to achieve the given final output pi
?
or Suggest me the right approach to do this. I feel my regex is unpleasant.
Advertisement
Answer
You can iterate row by row inside the <table>
, where all information is stored. For example:
import requests from bs4 import BeautifulSoup url = 'https://in.finance.yahoo.com/gainers?offset=0&count=100' soup = BeautifulSoup(requests.get(url).content, 'html.parser') fmt_string = '{:<15} {:<60} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10}' print(fmt_string.format('Symbol', 'Name', 'Price(int)', 'Change', '% change', 'Volume', 'AvgVol(3M)', 'Market Cap', 'PE ratio')) for row in soup.select('table:has(a[href*="/quote/"]) > tbody > tr'): cells = [td.get_text(strip=True) for td in row.select('td')] print(fmt_string.format(*cells[:-1]))
Prints:
Symbol Name Price(int) Change % change Volume AvgVol(3M) Market Cap PE ratio CCCL.NS Consolidated Construction Consortium Limited 0.2000 +0.0500 +33.33% 57,902 290,154 159.404M N/A KSERASERA.NS KSS Limited 0.2500 +0.0500 +25.00% 1.607M 2.601M 533.97M N/A BONLON.BO BONLON INDUSTRIES LIMITED 21.60 +3.60 +20.00% 16,000 N/A N/A N/A MENONBE.NS Menon Bearings Limited 52.80 +8.80 +20.00% 2.334M 65,713 2.952B 25.05 RPOWER.NS Reliance Power Limited 3.3000 +0.5500 +20.00% 127.814M 18.439M 9.223B N/A 11DPD.BO Nippon India Mutual Fund 0.0600 +0.0100 +20.00% 190 N/A N/A N/A ABFRLPP-E1.NS Aditya Birla Rs.5 ppd up 105.65 +17.60 +19.99% 1.238M N/A N/A N/A 500110.BO Chennai Petroleum Corporation Limited 64.55 -0.15 -0.23% 42,765 61,584 9.612B N/A ABFRLPP.BO Aditya Birla Fashion and Retai 106.05 +17.65 +19.97% 387,703 N/A N/A N/A RADIOCITY.NS Music Broadcast Limited 21.35 +3.55 +19.94% 12.657M 1.013M 7.38B 124.13 RADIOCITY.BO Music Broadcast Limited 21.35 +3.55 +19.94% 898,070 90,236 7.38B 124.13 MENONBE.BO Menon Bearings Limited 52.65 +8.75 +19.93% 137,065 8,648 2.951B 24.98 MTNL.BO Mahanagar Telephone Nigam Limited 10.72 +1.78 +19.91% 1.142M 156,275 6.754B N/A ...and so on.