Skip to content
Advertisement

Scrape Historical Bitcoin Data from Coinmarketcap with BeautifulSoup

I’m trying to scrape Historical Bitcoin Data from coinmarketcap.com in order to get close, volume, date, high and low values since the beginning of the year until Sep 30, 2021. After going through threads and videos for hours, and I’m new to scraping with Python, I don’t know what my mistake is (or is there something with the website I don’t detect?). The following is my code:

from bs4 import BeautifulSoup
import requests
import pandas as pd


closeList = []
volumeList = []
dateList = []
highList = []
lowList = []

website = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'

r = requests.get(website)

r = requests.get(website)
soup = BeautifulSoup(r.text, 'lxml')

tr = soup.find_all('tr')
FullData = []
for item in tr:
    closeList.append(item.find_all('td')[4].text)
    volumeList.append(item.find_all('td')[5].text)
    dateList.append(item.find('td',{'style':'text-align: left;'}).text)
    highList.append(item.find_all('td')[2].text)
    lowList.append(item.find_all('td')[3].text)
    FullData.append([closeList,volumeList,dateList,highList,lowList])

df_columns = ["close", "volume", "date", "high", "low"]

df = pd.DataFrame(FullData, columns = df_columns)
print(df)

As a result I only get:

Empty DataFrame
Columns: [close, volume, date, high, low]
Index: []

The task obliges me to scrape with BeautifulSoup and then export to csv (which obviously then is simply df.to_csv – can somebody help me out? That would be highly appreciated.

Advertisement

Answer

Actually, data is loaded dynamically by javascript from api calls json response. So you can grab data easily as follows:

Code:

import requests
import json
import pandas as pd
api_url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1632441600&timeEnd=1637712000'
r = requests.get(api_url)
data = []
for item in r.json()['data']['quotes']:
    close = item['quote']['close']
    volume =item['quote']['volume']
    date=item['quote']['timestamp']
    high=item['quote']['high']
    low=item['quote']['low']
    data.append([close,volume,date,high,low])


cols = ["close", "volume","date","high","low"]

df = pd.DataFrame(data, columns= cols)
print(df)
#df.to_csv('info.csv',index = False)

Output:

           close        volume                      date          high           low
0   42839.751696  4.283935e+10  2021-09-24T23:59:59.999Z  45080.491063  40936.557169
1   42716.593147  3.160472e+10  2021-09-25T23:59:59.999Z  42996.259704  41759.920425
2   43208.539105  3.066122e+10  2021-09-26T23:59:59.999Z  43919.300970  40848.461660
3   42235.731847  3.098003e+10  2021-09-27T23:59:59.999Z  44313.245882  42190.632576
4   41034.544665  3.021494e+10  2021-09-28T23:59:59.999Z  42775.146142  40931.662500
..           ...           ...                       ...           ...           ...
56  58119.576194  3.870241e+10  2021-11-19T23:59:59.999Z  58351.113266  55705.180685
57  59697.197134  3.062426e+10  2021-11-20T23:59:59.999Z  59859.880442  57469.725661
58  58730.476639  2.612345e+10  2021-11-21T23:59:59.999Z  60004.426383  58618.931432
59  56289.287323  3.503612e+10  2021-11-22T23:59:59.999Z  59266.358468  55679.840404
60  57569.074876  3.748580e+10  2021-11-23T23:59:59.999Z  57875.516397  55632.759912

[61 rows x 5 columns]
Advertisement