Skip to content
Advertisement

How to read all csv files from web page in a pandas data frame?

I’m trying to read all .csv files from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports to a data frame.

My code so far:

url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
x = requests.get(url).text
filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x)
frame = pd.concat(pd.read_csv(url + y) for y in filenames) 

Maybe somebody can help :D

Advertisement

Answer

Change the URL to

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'

and it should work. This gives you access to the raw csv file and not to a page the csv is on.

Edit: Just noticed that you need your old url to get the filenames:

url_raw = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'
url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
x = requests.get(url).text
filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x)
frame = pd.concat(pd.read_csv(url_raw + y) for y in filenames)
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement