API web data capture

Question

I am attempting to pull golf stats for an analysis project. TL;DR summary: Should I scrape or use a loop with API I found in network console? I want to pull data for 6 or 7 stat categories, by year(2015-present), and preferably by tournament to better categorize player tournament performance. Base Url is: htt…

Accepted Answer

I&#8217;d go for scraping, as the url itself gives you more control over what you&#8217;re after. Also, you can easily get the tabular data with pandas.For example:import requestsimport pandas as pdheaders = {    "accept": "application/json, text/javascript, */*; q=0.01",    "accept-encoding": "gzip, deflate, br",    "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.99 Safari/537.36",    "x-requested-with": "XMLHttpRequest",}url = "https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html"html = requests.get(url).textdf = pd.read_html(html, flavor="html5lib")df = pd.concat(df).drop([0, 1, 2], axis=1)df.to_csv("golf.csv", index=False)Gives you this:You can then keep swapping the urls or modify the stat., y, and eon part of the URL to get different stats. For example, this is 2018 U.S. Open &#8211; https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html

Advertisement

Answer