JavaScript
x
22
22
1
# Import libs
2
import pandas as pd
3
import requests
4
from bs4 import BeautifulSoup
5
import json
6
7
# Form Data for passing to the request body
8
formdata = {'objid': '14'}
9
10
# URL
11
url = "https://www.sec.kerala.gov.in/public/getalllbcmp/byd"
12
13
# Query
14
for i in range(1, 15):
15
formdata["objid"] = str(i)
16
response = requests.request("POST", url, data=formdata, timeout=1500)
17
out = response.content
18
soup = BeautifulSoup(out,"html.parser")
19
bat = json.loads(soup.text)
20
df = pd.DataFrame(bat["ops1"])
21
df.to_csv(str(i) + ".csv")
22
Right now this query creates 14 csv files. What I wanted is, the for loop to remove the first row of column headers and append the data to a dataframe I created outside the for loop. so that I can get it as single csv file.
I am using BS and Pandas.
Advertisement
Answer
This is one way of achieving your goal:
JavaScript
1
18
18
1
# Import libs
2
import pandas as pd
3
import requests
4
from tqdm import tqdm ## if using jupyter: from tqdm.notebook import tqdm
5
6
final_df = pd.DataFrame()
7
# URL
8
url = "https://www.sec.kerala.gov.in/public/getalllbcmp/byd"
9
10
# Query
11
for i in tqdm(range(1, 15)):
12
formdata = {'objid': i}
13
r = requests.post(url, data=formdata)
14
df = pd.json_normalize(r.json()["ops1"])
15
final_df = pd.concat([final_df, df], axis=0, ignore_index=True)
16
final_df.to_csv('some_data_saved.csv')
17
print(final_df)
18
Data will be saved to a csv file, and also printed in terminal:
JavaScript
1
16
16
1
100%
2
14/14 [00:14<00:00, 1.05s/it]
3
value text
4
0 8o7LEdvX2e G14001-Kumbadaje
5
1 jw2XOQyZ4K G14002-Bellur
6
2 0lMB1O4LbV G14003-Karadka
7
3 zodLro2Z39 G14004-Muliyar
8
4 dWxLYn8ZME G14005-Delampady
9
10
1029 Qy6Z09bBKE G01073-Ottoor
11
1030 ywoXG8wLxV M01001-Neyyattinkara
12
1031 Kk8Xvz7XO9 M01002-Nedumangad
13
1032 r7eXQYgX8m M01003-Attingal
14
1033 b3KXlO2B8g M01004-Varkala
15
1034 rows × 2 columns
16
Requests can return responses in JSON format, so you don;t need to import bs4 & json.
For TQDM, please see https://pypi.org/project/tqdm/
For pandas documentation, visit https://pandas.pydata.org/docs/
Also for Requests: https://requests.readthedocs.io/en/latest/