Skip to content
Advertisement

Pythons requests library removing appending question mark from URL

Goal

Make request to http://example.com/page? using requests.get()

Problem

The question mark (“?”) is automatically stripped from the request if it is the last character in the URL (eg. http://example.com/page?1 and http://example.com/page?! work, http://example.com/page? does not)

Sample code

import requests

endpoint = "http://example.com/page?"
r = requests.get(endpoint)

print(r.url) # -> "http://example.com/page"
assert r.url == endpoint # Raises AssertionError

Question

Without modifying the library, is it possible to reach the intended endpoint? Both intended solutions (if such exist) and workarounds are welcome.

Thanks!

Advertisement

Answer

This is not possible with the requests library. URLs passed into requests are parsed by urllib3.util.url.parse_url() into separate parts:

scheme
auth
host
port
path
query
fragment

The logic for getting the query part of a URL assumes that the querystring starts after ?, but since there is nothing after the question mark, it gives a blank query. The URL is then reconstructed as a string when you print r.url. That is why the URL does not have the trailing question mark.

I found that the behavior you are looking for is possible with urllib.request, though. Here’s an example:

import urllib.request, urllib.error

try:
    response = urllib.request.urlopen("http://example.com/page?") 
    print(response.url)  # -> http://example.com/page?
except urllib.error.HTTPError as e:
    print(e.url)  # -> http://example.com/page?
    print(e.code) # -> 404

I have surrounded the request in a try/except because if the page you are trying to get gives a 404, urllib will raise an error, where requests will simply put up with it.

Advertisement