I’m trying to read all .csv files from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports to a data frame.
My code so far:
JavaScript
x
5
1
url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
2
x = requests.get(url).text
3
filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x)
4
frame = pd.concat(pd.read_csv(url + y) for y in filenames)
5
Maybe somebody can help :D
Advertisement
Answer
Change the URL to
JavaScript
1
2
1
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'
2
and it should work. This gives you access to the raw csv file and not to a page the csv is on.
Edit: Just noticed that you need your old url to get the filenames:
JavaScript
1
6
1
url_raw = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'
2
url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
3
x = requests.get(url).text
4
filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x)
5
frame = pd.concat(pd.read_csv(url_raw + y) for y in filenames)
6