I’m trying to scrape Historical Bitcoin Data from coinmarketcap.com in order to get close, volume, date, high and low values since the beginning of the year until Sep 30, 2021. After going through threads and videos for hours, and I’m new to scraping with Python, I don’t know what my mistake is (or is there something with the website I don’t detect?). The following is my code:
from bs4 import BeautifulSoup import requests import pandas as pd closeList = [] volumeList = [] dateList = [] highList = [] lowList = [] website = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/' r = requests.get(website) r = requests.get(website) soup = BeautifulSoup(r.text, 'lxml') tr = soup.find_all('tr') FullData = [] for item in tr: closeList.append(item.find_all('td')[4].text) volumeList.append(item.find_all('td')[5].text) dateList.append(item.find('td',{'style':'text-align: left;'}).text) highList.append(item.find_all('td')[2].text) lowList.append(item.find_all('td')[3].text) FullData.append([closeList,volumeList,dateList,highList,lowList]) df_columns = ["close", "volume", "date", "high", "low"] df = pd.DataFrame(FullData, columns = df_columns) print(df)
As a result I only get:
Empty DataFrame Columns: [close, volume, date, high, low] Index: []
The task obliges me to scrape with BeautifulSoup and then export to csv (which obviously then is simply df.to_csv – can somebody help me out? That would be highly appreciated.
Advertisement
Answer
Actually, data is loaded dynamically by javascript from api calls json response. So you can grab data easily as follows:
Code:
import requests import json import pandas as pd api_url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1632441600&timeEnd=1637712000' r = requests.get(api_url) data = [] for item in r.json()['data']['quotes']: close = item['quote']['close'] volume =item['quote']['volume'] date=item['quote']['timestamp'] high=item['quote']['high'] low=item['quote']['low'] data.append([close,volume,date,high,low]) cols = ["close", "volume","date","high","low"] df = pd.DataFrame(data, columns= cols) print(df) #df.to_csv('info.csv',index = False)
Output:
close volume date high low 0 42839.751696 4.283935e+10 2021-09-24T23:59:59.999Z 45080.491063 40936.557169 1 42716.593147 3.160472e+10 2021-09-25T23:59:59.999Z 42996.259704 41759.920425 2 43208.539105 3.066122e+10 2021-09-26T23:59:59.999Z 43919.300970 40848.461660 3 42235.731847 3.098003e+10 2021-09-27T23:59:59.999Z 44313.245882 42190.632576 4 41034.544665 3.021494e+10 2021-09-28T23:59:59.999Z 42775.146142 40931.662500 .. ... ... ... ... ... 56 58119.576194 3.870241e+10 2021-11-19T23:59:59.999Z 58351.113266 55705.180685 57 59697.197134 3.062426e+10 2021-11-20T23:59:59.999Z 59859.880442 57469.725661 58 58730.476639 2.612345e+10 2021-11-21T23:59:59.999Z 60004.426383 58618.931432 59 56289.287323 3.503612e+10 2021-11-22T23:59:59.999Z 59266.358468 55679.840404 60 57569.074876 3.748580e+10 2021-11-23T23:59:59.999Z 57875.516397 55632.759912 [61 rows x 5 columns]