Skip to content
Advertisement

How to parse raw HTTP request in Python 3?

I am looking for a native way to parse an http request in Python 3.

This question shows a way to do it in Python 2, but uses now deprecated modules, (and Python 2) and I am looking for a way to do it in Python 3.

I would mainly like to just figure out what resource is requested and parse the headers and from a simple request. (i.e):

GET /index.html HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

Can someone show me a basic way to parse this request?

Advertisement

Answer

You could use the email.message.Message class from the email module in the standard library.

By modifying the answer from the question you linked, below is a Python3 example of parsing HTTP headers.

Suppose you wanted to create a dictionary containing all of your header fields:

import email
import pprint
from io import StringIO

request_string = 'GET / HTTP/1.1rnHost: localhostrnConnection: keep-alivernCache-Control: max-age=0rnUpgrade-Insecure-Requests: 1rnUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36rnAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8rnAccept-Encoding: gzip, deflate, sdchrnAccept-Language: en-US,en;q=0.8'

# pop the first line so we only process headers
_, headers = request_string.split('rn', 1)

# construct a message from the request string
message = email.message_from_file(StringIO(headers))

# construct a dictionary containing the headers
headers = dict(message.items())

# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)

if you ran this at a python prompt, the result would look like:

{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'Accept-Encoding': 'gzip, deflate, sdch',
 'Accept-Language': 'en-US,en;q=0.8',
 'Cache-Control': 'max-age=0',
 'Connection': 'keep-alive',
 'Host': 'localhost',
 'Upgrade-Insecure-Requests': '1',
 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
Advertisement