Skip to content
Advertisement

BeautifulSoup extract conditioned digit coloured by css

I successfully get the data from this table from THRIVEN : enter image description here

But as you can see, at the Net% column, those values negative/positive are determined by some CSS (which I believed, and I couldn’t find them where they are located).

How can I extract those data and put them into my Excel as negative/positive numbers? Below is my current code :

lwb = load_workbook(filename='THRIVEN.xlsx')
lws = lwb['THRI']

klseLink = 'https://www.klsescreener.com/v2/stocks/view/7889'
klseParser = BeautifulSoup(klseLink.text, 'html.parser')

currentQuarterReportTable = klseParser.find('table', {'class': 'financial_reports table table-hover table-sm table-theme'}).findAll('tr', limit=5)
currentQuarterReportSelectedRow = []

print("")
print("==================== CURRENT QUARTER REPORT =====================")
print("")

try:
    for currentQuarterReportRow in currentQuarterReportTable[1:]:
        navigatedCurrentQuarterReportColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")]
    
        navigatedCurrentQuarterReportColumn.pop(0)
        navigatedCurrentQuarterReportColumn.pop(0)
        navigatedCurrentQuarterReportColumn.pop(0)
        navigatedCurrentQuarterReportColumn.pop(4)
        navigatedCurrentQuarterReportColumn.pop(6)
        currentQuarterReportSelectedRow.append(navigatedCurrentQuarterReportColumn)
    
    currentQuarterReportLimitedTable = pd.DataFrame(currentQuarterReportSelectedRow, columns=['Revenue', 'Profit/Loss', 'Quarter', 'Quarter Date', 'Announced Date', 'Net'])
    currentQuarterReportLimitedTable = currentQuarterReportLimitedTable.rename(index={0: '1', 1: '2', 2: '3', 3: '4'})
    print(currentQuarterReportLimitedTable)

    i = 0
    for currentQuarterReportRow in currentQuarterReportTable[1:]:
        i += 1
        selectedColumn = [td.text.strip() for td in currentQuarterReportRow.findAll("td")]
        quarter = selectedColumn[5]
        quarterDate = selectedColumn[6]
        announcedDate = selectedColumn[8]
        revenue = (selectedColumn[3].replace("k", "")).replace(",", "")
        profitloss = (selectedColumn[4].replace("k", "")).replace(",", "")
        net = selectedColumn[9].replace("%", "")
    
        lws.cell(18 + int(i), 3).value = int(quarter)
        lws.cell(18 + int(i), 5).value = quarterDate
        lws.cell(18 + int(i), 7).value = announcedDate
        lws.cell(18 + int(i), 9).value = int(revenue)
        lws.cell(18 + int(i), 11).value = int(profitloss)
        lws.cell(18 + int(i), 13).value = float(net)

except IndexError:
    print("No Quarterly Report from KLScreener")
    
lwb.save('THRIVEN.xlsx')

Giving me :

enter image description here

Note that the Revenue and Profit/Loss colors are conditioned in Excel itself.

EDIT :

Finally I can achieve this by :

for currentQuarterReportRow in currentQuarterReportTable[1:]: #currentQuarterReportRow in currentQuarterReportTable[1:]:
    currentQuarterReportRow = currentQuarterReportRow.find_all('td')[-2]
    if currentQuarterReportRow.find('span', {'class':'btn-sm btn-danger'}):
        print(float(currentQuarterReportRow.get_text().replace('%', '')) * -1)
    else:
        print(float(currentQuarterReportRow.get_text().replace('%', '')))

Thanks to @HedgeHog suggesting the solutions! :D

Advertisement

Answer

Check the class of the button to differentiate positive or negative value:

if net.select_one('.btn-danger'):
    print(float(net.get_text().replace('%',''))*-1)
else:
    print(float(net.get_text().replace('%','')))

Example

from bs4 import BeautifulSoup

html='''
<tr class="table-alternate">
<td class="number">-1.20</td>
<td class="number">0.000</td>
<td class="number">0.3400</td>
<td class="number">34,780k</td>
<td class="number">-6,537k</td>
<td class="text-center">4</td>
<td><span style="white-space: nowrap">2020-12-31</span></td>
<td><span style="white-space: nowrap">31 Dec, 2020</span></td>
<td><span style="white-space: nowrap">2021-02-25</span></td>
<td class="number"><span class="btn-sm btn-danger">20%</span></td>
<td><a href="/v2/stocks/financial-report/7889/2020-12-31" target="_blank">View</a>                                </td>
</tr>
<tr class="table-alternate">
<td class="number">1.27</td>
<td class="number">0.000</td>
<td class="number">0.3500</td>
<td class="number">49,244k</td>
<td class="number">6,959k</td>
<td class="text-center">3</td>
<td><span style="white-space: nowrap">2020-09-30</span></td>
<td><span style="white-space: nowrap">31 Dec, 2020</span></td>
<td><span style="white-space: nowrap">2020-11-20</span></td>
<td class="number"><span class="btn-sm btn-success">35%</span></td>
<td><a href="/v2/stocks/financial-report/7889/2020-09-30" target="_blank">View</a>                                </td>
</tr>

'''
soup = BeautifulSoup(html,'html.parser')

for currentQuarterReportRow in soup.find_all('tr'):
    net = currentQuarterReportRow.find_all('td')[-2]
    if net.select_one('.btn-danger'):
        print(float(net.get_text().replace('%',''))*-1)
    else:
        print(float(net.get_text().replace('%','')))
        

Output

-20.0
35.0
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement