Skip to content
Advertisement

Recreate POST request with WebKitFormBoundary using Python’s requests

I am attempting to scrape some data from a website using a POST request with the Python requests library. Unfortunately I am unable to post a link to the page as you must be signed in to the website to site to use it.

The request I am trying to replicate has the file extension .ehtml and this is part of the Request payload I am looking to recreate:

------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="session_id"

W0pNKn8AAQEAACD-XkYAAAAJ
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="p_session_id"

W0pMOH8AAQEAABZSUVkAAAAD
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="attach_key"


------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="chosen"

0
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="debug"


------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="language"

en
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="game_system_id"

NULL
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="collection_detail_id"

NULL
------WebKitFormBoundary8rntuVzldIBHkILv
Content-Disposition: form-data; name="competition_id"

NULL

Using some help from some of the questions on stackoverflow, I have managed to recreate it this far:

--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="session_id"


--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="p_session_id"


--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="attach_key"


--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="chosen"

0
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="debug"


--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="language"

en
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="game_system_id"

NULL
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="collection_detail_id"

NULL
--30b11983bde849109a3dc93e139e16d4
Content-Disposition: form-data; name="competition_id"

NULL

That was done using this code:

Q = {
     "session_id" : (None,""),
     "p_session_id" : (None,""),
     "attach_key" : (None,""),
     "chosen" : (None,"0"),
     "debug" : (None,""),
     "language" : (None,"en"),
     "game_system_id" : (None,"NULL"),
     "collection_detail_id" : (None,"NULL"),
     "competition_id" : (None,"NULL")
     }


with requests.Session() as s:
    p = s.post(login_URL2,data=payload)
    #print(p.text)

    #d = s.post(req_url,files=Q)
    d2 = Request("POST",req_url,files=Q)    


d3 = d2.prepare()
print(d3.body.decode('utf-8'))

I believe the last thing I am missing is the WebKitFormBoundary part, I am unable to find anywhere how to insert that part. This is my first time scraping using an .ehtml file, so if I have missed anything else obvious, all help is much appreciated.

Advertisement

Answer

The exact name of the boundary does not matter as long as it is declared in the header:

Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p

With this header the boundaries would be

--gc0p4Jq0M2Yt08jU534c0p

There server will take a look at the Content-Type header and figure out the body parts.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement