I’m trying to read all .csv files from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports to a data frame.
My code so far:
url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports' x = requests.get(url).text filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x) frame = pd.concat(pd.read_csv(url + y) for y in filenames)
Maybe somebody can help :D
Advertisement
Answer
Change the URL to
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'
and it should work. This gives you access to the raw csv file and not to a page the csv is on.
Edit: Just noticed that you need your old url to get the filenames:
url_raw = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/' url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports' x = requests.get(url).text filenames = re.findall('[d]{1,2}-[d]{1,2}-[d]{4}.csv', x) frame = pd.concat(pd.read_csv(url_raw + y) for y in filenames)