I’m currently looking to scrape https://www.bestfightodds.com/ for an MMA machine learning project. I’m specifically looking for the DraftKings opening odds for each fighter which is found by clicking on the odds for a given fighter under the DraftKings column. You are then presented with a popup table that shows how the betting odds have changed over time. The table presents you with the openings odds and the latest (current) odds.
I have no issue scraping the fighter names, but I can’t figure out how to scrape the opening odds in the popup table. The HTML code from the popup table only appears in the inspect function when you click on it which is why I get a ‘None’ when I try to find it in the site’s HTML.
This is my code so far:
# Importing packages from bs4 import BeautifulSoup import requests # Specifying website URL html_text = requests.get('https://www.bestfightodds.com/events/ufc-273-2411').text soup = BeautifulSoup(html_text, 'lxml') # Finding values fighter_names = soup.find_all('span', class_ = 't-b-fcc') opening_odds = soup.find_all('span, style_ = 'margin-left: 4px; margin-right: 4px;') for fighter_names in soup.find_all('span', class_ = 't-b-fcc'): print (fighter_names.get_text())
Here is a photo of where and how to locate the opening odds. The blue box is where you click to find the red one, which is the one I need to scrape for all fighters.
Advertisement
Answer
Fun little project. The data the server sends are encoded by custom JavaScript function, so you need to use selenium
or rewrite the decoding function to Python.
I used js2py
to execute the javascript function directly in python (and not use selenium
– it rewrites the javascript function to python automatically), but you can rewrite it to Python if you wish:
import json import js2py import requests from bs4 import BeautifulSoup js_decode_func = r"""function $(e) { var t, a, r, s, o, i, l = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=', n = '', d = 0; for (e = e.replace(/[^A-Za-z0-9+/=]/g, ''); d < e.length;) t = l.indexOf(e.charAt(d++)) << 2 | (s = l.indexOf(e.charAt(d++))) >> 4, a = (15 & s) << 4 | (o = l.indexOf(e.charAt(d++))) >> 2, r = (3 & o) << 6 | (i = l.indexOf(e.charAt(d++))), n += String.fromCharCode(t), 64 != o && (n += String.fromCharCode(a)), 64 != i && (n += String.fromCharCode(r)); for (var c = '', h = 0, p = c1 = c2 = 0; h < n.length;)(p = n.charCodeAt(h)) < 128 ? (c += String.fromCharCode(p), h++) : 191 < p && p < 224 ? (c2 = n.charCodeAt(h + 1), c += String.fromCharCode((31 & p) << 6 | 63 & c2), h += 2) : (c2 = n.charCodeAt(h + 1), c3 = n.charCodeAt(h + 2), c += String.fromCharCode((15 & p) << 12 | (63 & c2) << 6 | 63 & c3), h += 3); var u, f, m, g = '!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~', y = new String, $ = g.length; for (u = 0; u < c.length; u++) m = c.charAt(u), 0 <= (f = g.indexOf(m)) && (m = g.charAt((f + $ / 2) % $)), y += m; return y }""" js_get_value_func = r"""function $(e) { return 2 <= e ? '+' + Math.round(100 * (e - 1)) : e < 2 ? '' + Math.round( - 100 / (e - 1)) : 'error' }""" decode = js2py.eval_js(js_decode_func) get_value = js2py.eval_js(js_get_value_func) url = "https://www.bestfightodds.com/" api_url = "https://www.bestfightodds.com/api/ggd" params = {"b": "22", "m": "25728", "p": "1"} soup = BeautifulSoup(requests.get(url).content, "html.parser") for td in soup.select("td[data-li]"): vals = json.loads(td["data-li"]) if len(vals) != 3 or vals[0] != 22: # 22 - DraftKings continue params["b"], params["p"], params["m"] = vals name = td.find_previous(class_="t-b-fcc").text encoded_text = requests.get(api_url, params=params).text data = json.loads(decode(encoded_text)) first_value = get_value(data[0]["data"][0]["y"]) print(name, first_value)
Prints:
Alexander Volkanovski -450 Chan Sung Jung +340 Aljamain Sterling +320 Petr Yan -425 Mackenzie Dern +120 Tecia Torres -140 Mark Madsen +130 Vinc Pichel -150 Darian Weeks +190 Ian Garry -235 Mickey Gall +145 Mike Malott. -165 Aspen Ladd +155 Raquel Pennington -180 Anthony Hernandez -180 Josh Fremd +155 Aleksei Oleinik -105 Jared Vanderaa -115 Kay Hansen -150 Piera Rodriguez +130 Daniel Santos +175 Julio Arce -210 Belal Muhammad +150 Vicente Luque -170 Devin Clark -160 William Knight +140 Jordan Leavitt +110 Trey Ogden. -130 Elizeu Zaleski Dos Santos -195 Mounir Lazzez +165 Pat Sabatini -305 T.J. Laramie +240 Mayra Bueno Silva -365 Yanan Wu +280 Lina Akhtar Lansberg +245 Pannie Kianzad -310 Chris Barnett +165 Martin Buday -195 Andre Fialho +150 Miguel Baeza -170 Brandon Jenkins +320 Drakkar Klose -425 Jesse Ronson +110 Rafa Garcia -130 Caio Borralho +115 Gadzhi Omargadzhiev -135 Istela Nunes -190 Sam Hughes +160 Heili Alateng -180 Kevin Croom +155 Carla Esparza +150 Rose Namajunas -170 Glover Teixeira +155 Jiri Prochazka -180 Dustin Poirier -435 Nate Diaz +330 Charles Oliveira -160 Justin Gaethje +140 Gilbert Burns +280 Khamzat Chimaev -365 Arman Tsarukyan -335 Joel Alvarez +260 Calvin Cattar +170 Giga Chikadze -200