Skip to content
Advertisement

Importing data from URL using Python (into pandas dataframe)?

I’ve gone around in circles on this one. A bit frustrating as the solution is probably close at hand.

Anyway, I found a URL that returns some data in CSV format. However, the URL itself does not contain the csv file name. In a web browser, I can easily go to the link and them I’m asked whether I want to open or save the file. So, ultimately I know I’m getting a csv file with a name. I’m just not sure how to execute the task in python as there seems to be some intermediate data type being passed (bytes)

I’ve tried the following to no avail:

import urllib
import io
import pandas as pd
link = r'http://www.cboe.com/products/vix-index-volatility/vix-options-and-futures/vix-index/vix-historical-data/'
f = urllib.request.urlopen(link)
myfile = f.read()
buf = io.BytesIO(myfile)  # originally tried io.StringIO(myfile) but then realized myfile is in bytes
df = pd.read_csv(buf)

Any suggestions?

The df should contain data that looks similar to:

1/5/2004,18.45,18.49,17.44,17.49 1/6/2004,17.66,17.67,16.19,16.73 1/7/2004,16.72,16.75,15.5,15.5 1/8/2004,15.42,15.68,15.32,15.61 1/9/2004,16.15,16.88,15.57,16.75 1/12/2004,17.32,17.46,16.79,16.82

Here is the last line of the error message:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 24, saw 4

Advertisement

Answer

@Fred – I think that you are simply using the wrong URL. When I replace the link with http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/vixcurrent.csv, your script works.

I found this URL on the page your script originally pointed to.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement