I successfully get the data from this table from THRIVEN :
But as you can see, at the Net%
column, those values negative/positive are determined by some CSS (which I believed, and I couldn’t find them where they are located).
How can I extract those data and put them into my Excel as negative/positive numbers? Below is my current code :
lwb = load_workbook(filename='THRIVEN.xlsx') lws = lwb['THRI'] klseLink = 'https://www.klsescreener.com/v2/stocks/view/7889' klseParser = BeautifulSoup(klseLink.text, 'html.parser') currentQuarterReportTable = klseParser.find('table', {'class': 'financial_reports table table-hover table-sm table-theme'}).findAll('tr', limit=5) currentQuarterReportSelectedRow = [] print("") print("==================== CURRENT QUARTER REPORT =====================") print("") try: for currentQuarterReportRow in currentQuarterReportTable[1:]: navigatedCurrentQuarterReportColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")] navigatedCurrentQuarterReportColumn.pop(0) navigatedCurrentQuarterReportColumn.pop(0) navigatedCurrentQuarterReportColumn.pop(0) navigatedCurrentQuarterReportColumn.pop(4) navigatedCurrentQuarterReportColumn.pop(6) currentQuarterReportSelectedRow.append(navigatedCurrentQuarterReportColumn) currentQuarterReportLimitedTable = pd.DataFrame(currentQuarterReportSelectedRow, columns=['Revenue', 'Profit/Loss', 'Quarter', 'Quarter Date', 'Announced Date', 'Net']) currentQuarterReportLimitedTable = currentQuarterReportLimitedTable.rename(index={0: '1', 1: '2', 2: '3', 3: '4'}) print(currentQuarterReportLimitedTable) i = 0 for currentQuarterReportRow in currentQuarterReportTable[1:]: i += 1 selectedColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")] quarter = selectedColumn[5] quarterDate = selectedColumn[6] announcedDate = selectedColumn[8] revenue = (selectedColumn[3].replace("k", "")).replace(",", "") profitloss = (selectedColumn[4].replace("k", "")).replace(",", "") net = selectedColumn[9].replace("%", "") lws.cell(18 + int(i), 3).value = int(quarter) lws.cell(18 + int(i), 5).value = quarterDate lws.cell(18 + int(i), 7).value = announcedDate lws.cell(18 + int(i), 9).value = int(revenue) lws.cell(18 + int(i), 11).value = int(profitloss) lws.cell(18 + int(i), 13).value = float(net) except IndexError: print("No Quarterly Report from KLScreener") lwb.save('THRIVEN.xlsx')
Giving me :
Note that the Revenue
and Profit/Loss
colors are conditioned in Excel itself.
EDIT :
Finally I can achieve this by :
for currentQuarterReportRow in currentQuarterReportTable[1:]: #currentQuarterReportRow in currentQuarterReportTable[1:]: currentQuarterReportRow = currentQuarterReportRow.find_all('td')[-2] if currentQuarterReportRow.find('span', {'class':'btn-sm btn-danger'}): print(float(currentQuarterReportRow.get_text().replace('%', '')) * -1) else: print(float(currentQuarterReportRow.get_text().replace('%', '')))
Thanks to @HedgeHog suggesting the solutions! :D
Advertisement
Answer
Check the class
of the button
to differentiate positive or negative value:
if net.select_one('.btn-danger'): print(float(net.get_text().replace('%',''))*-1) else: print(float(net.get_text().replace('%','')))
Example
from bs4 import BeautifulSoup html=''' <tr class="table-alternate"> <td class="number">-1.20</td> <td class="number">0.000</td> <td class="number">0.3400</td> <td class="number">34,780k</td> <td class="number">-6,537k</td> <td class="text-center">4</td> <td><span style="white-space: nowrap">2020-12-31</span></td> <td><span style="white-space: nowrap">31 Dec, 2020</span></td> <td><span style="white-space: nowrap">2021-02-25</span></td> <td class="number"><span class="btn-sm btn-danger">20%</span></td> <td><a href="/v2/stocks/financial-report/7889/2020-12-31" target="_blank">View</a> </td> </tr> <tr class="table-alternate"> <td class="number">1.27</td> <td class="number">0.000</td> <td class="number">0.3500</td> <td class="number">49,244k</td> <td class="number">6,959k</td> <td class="text-center">3</td> <td><span style="white-space: nowrap">2020-09-30</span></td> <td><span style="white-space: nowrap">31 Dec, 2020</span></td> <td><span style="white-space: nowrap">2020-11-20</span></td> <td class="number"><span class="btn-sm btn-success">35%</span></td> <td><a href="/v2/stocks/financial-report/7889/2020-09-30" target="_blank">View</a> </td> </tr> ''' soup = BeautifulSoup(html,'html.parser') for currentQuarterReportRow in soup.find_all('tr'): net = currentQuarterReportRow.find_all('td')[-2] if net.select_one('.btn-danger'): print(float(net.get_text().replace('%',''))*-1) else: print(float(net.get_text().replace('%','')))
Output
-20.0 35.0