I’m a noob trying to learn Python by scraping a website to track fund parameters. So far, the following code isolates and shows the data that I need,
JavaScript
x
30
30
1
from bs4 import BeautifulSoup
2
import requests
3
4
source = requests.get('https://www.fundaggregatorurl.com/path/to/fund').text
5
soup = BeautifulSoup(source, 'lxml')
6
7
# print(soup.prettify())
8
9
print("n1Y growth rate vs S&P BSE 500 TRIn")
10
# Pinpoints the 1Y growth rate of the scheme and the S&P BSE 500 TRI
11
for snippet in soup.find_all('div', class_='scheme_per_amt prcntreturn 1Y'):
12
print(snippet.text.lstrip())
13
14
print("nNAV, AUM and Expense Ration")
15
# Pinpoints NAV, AUM and Expense Ratio
16
for snippet in soup.find_all('span', class_='amt'):
17
print(snippet.text)
18
19
# Get the risk analysis data
20
source = requests.get('https://www.fundaggregatorurl.com/path/to/fund/riskanalysis').text
21
soup = BeautifulSoup(source, 'lxml')
22
23
print("nRisk Ratiosn")
24
# Pinpoints NAV, AUM and Expense Ratio
25
for snippet in soup.find_all('div', class_='percentage'):
26
split_data = snippet.text.split('vs')
27
print(*split_data, sep=" ")
28
29
print()
30
This code shows the following data:
JavaScript
1
19
19
1
1Y growth rate vs S&P BSE 500 TRI
2
3
68.83%
4
50.85%
5
6
NAV, AUM and Expense Ratio
7
8
185.9414
9
2704.36
10
1.5%
11
12
Risk Ratios
13
14
19.76 17.95
15
0.89 0.93
16
0.77 0.72
17
0.17 0.14
18
4.59 2.32
19
How can I write this data to a CSV with the following headers?
JavaScript
1
3
1
Fund growth Category Growth Current NAV AUM Expense Ratio Fund std dev Category std dev Fund beta Category beta Fund Sharpe ratio Category Sharpe ratio Fund Treynor's ratio Category Treynor's Ratio Fund Jension's Alpha Category Jension's Alpha
2
68.83% 50.85% 185.9414 2704.36 1.5% 19.76 17.95 0.89 0.93 0.77 0.72 0.17 0.14 4.59 2.32
3
This is for a single fund and I need to get this data for about 100 more funds. I will experiment more and any issues there are perhaps for another Q at a later time :) Since I’m a newbie, any other improvements and why you’d do those would also be appreciated!
Advertisement
Answer
Assemble the data for each fund in a list to easily write it out in CSV format using Python’s builtin csv module:
JavaScript
1
21
21
1
import csv
2
3
funds = ['fund1', 'fund2']
4
# the header should match the number of data items
5
header = ['Fund growth', 'Category Growth', 'Current NAV', 'AUM']
6
7
with open('funds.csv', 'w', newline='') as csvfile:
8
fund_writer = csv.writer(csvfile)
9
fund_writer.writerow(header)
10
for fund in funds:
11
fund_data = []
12
source = requests.get('https://www.fundaggregatorurl.com/path/to/' + fund).text
13
soup = BeautifulSoup(source, 'lxml')
14
15
for snippet in soup.find_all('div', class_='scheme_per_amt prcntreturn 1Y'):
16
fund_data.append(snippet.text.lstrip())
17
18
# do remaining parsing...
19
20
fund_writer.writerow(fund_data)
21