Error ‘Unexpected HTTP code on the target page’, ‘status_code’: 403 when I try to request a json url with a proxy api

I’m trying to scrap this website https://triller.co/ , so I want to get information from profile pages like this https://triller.co/@warnermusicarg , what I do is trying to request the json url that contains the information, in this case it’s https://social.triller.co/v1.5/api/users/by_username/warnermusicarg When I use requests.get() it works normally and I can retrieve all the information.

import requests
import urllib.parse
from urllib.parse import urlencode

url = 'https://social.triller.co/v1.5/api/users/by_username/warnermusicarg'
headers = {'authority':'social.triller.co',
            'method':'GET',
            'path':'/v1.5/api/users/by_username/warnermusicarg',
            'scheme':'https',
            'accept':'*/*',
            'accept-encoding':'gzip, deflate, br',
            'accept-language':'ar,en-US;q=0.9,en;q=0.8',
            'authorization': 'Bearer eyJhbGciOiJIUzI1NiIsImlhdCI6MTY0MDc4MDc5NSwiZXhwIjoxNjkyNjIwNzk1fQ.eyJpZCI6IjUyNjQ3ODY5OCJ9.Ds-acbfcGSeUrGDSs47pBiT3b13Eb9SMcB8BF8OylqQ',
            'origin':'https://triller.co',
            'sec-ch-ua':'" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
            'sec-ch-ua-mobile':'?0',
            'sec-ch-ua-platform':'"Windows"',
            'sec-fetch-dest':'empty',
            'sec-fetch-mode':'cors',
            'sec-fetch-site':'same-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(url, headers=headers)

JavaScript
​x
 
import requests
import urllib.parse
from urllib.parse import urlencode
​
url = 'https://social.triller.co/v1.5/api/users/by_username/warnermusicarg'
headers = {'authority':'social.triller.co',
            'method':'GET',
            'path':'/v1.5/api/users/by_username/warnermusicarg',
            'scheme':'https',
            'accept':'*/*',
            'accept-encoding':'gzip, deflate, br',
            'accept-language':'ar,en-US;q=0.9,en;q=0.8',
            'authorization': 'Bearer eyJhbGciOiJIUzI1NiIsImlhdCI6MTY0MDc4MDc5NSwiZXhwIjoxNjkyNjIwNzk1fQ.eyJpZCI6IjUyNjQ3ODY5OCJ9.Ds-acbfcGSeUrGDSs47pBiT3b13Eb9SMcB8BF8OylqQ',
            'origin':'https://triller.co',
            'sec-ch-ua':'" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
            'sec-ch-ua-mobile':'?0',
            'sec-ch-ua-platform':'"Windows"',
            'sec-fetch-dest':'empty',
            'sec-fetch-mode':'cors',
            'sec-fetch-site':'same-site',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(url, headers=headers)
​

The problem arises when I try to use an API proxy providers as Webscraping.ai, ScrapingBee, etc

api_key='my_api_key'
api_url='https://api.webscraping.ai/html?'
params = {'api_key': api_key, 'timeout': '20000', 'url':url}
proxy_url = api_url + urlencode(params)
response2 = requests.get(proxy_url, headers=headers)

JavaScript
 
api_key='my_api_key'
api_url='https://api.webscraping.ai/html?'
params = {'api_key': api_key, 'timeout': '20000', 'url':url}
proxy_url = api_url + urlencode(params)
response2 = requests.get(proxy_url, headers=headers)
​

This gives me this error

2022-01-08 22:30:59 [urllib3.connectionpool] DEBUG: https://api.webscraping.ai:443 "GET /html?api_key=my_api_key&timeout=20000&url=https%3A%2F%2Fsocial.triller.co%2Fv1.5%2Fapi%2Fusers%2Fby_username%2Fwarnermusicarg&render_js=false HTTP/1.1" 502 91
{'status_code': 403, 'status_message': '', 'message': 'Unexpected HTTP code on the target page'}

JavaScript
 
2022-01-08 22:30:59 [urllib3.connectionpool] DEBUG: https://api.webscraping.ai:443 "GET /html?api_key=my_api_key&timeout=20000&url=https%3A%2F%2Fsocial.triller.co%2Fv1.5%2Fapi%2Fusers%2Fby_username%2Fwarnermusicarg&render_js=false HTTP/1.1" 502 91
{'status_code': 403, 'status_message': '', 'message': 'Unexpected HTTP code on the target page'}
​

What I tried to do is: 1- I searched for the meaning of 403 code in the documentation of my API proxy provider, it said that api_key is wrong, but I’m 100% sure it’s correct, Also, I changed to another API proxy provider but the same issue, Also, I had the same issue with twitter.com And I don’t know what to do?

Answer

I don’t know exactly what caused this error but I tried using their webscraping_ai.ApiClient() instance as in here and it worked,

configuration = webscraping_ai.Configuration(
                host = "https://api.webscraping.ai",
                api_key = {
                    'api_key': 'my_api_key'
                }
            )


with webscraping_ai.ApiClient(configuration) as api_client:
# Create an instance of the API class
   api_instance = webscraping_ai.HTMLApi(api_client)
   url_j = url # str | URL of the target page
headers = headers
timeout = 20000 
js = False 
proxy = 'datacenter' 
    
api_response = api_instance.get_html(url_j, headers=headers, timeout=timeout, js=js, proxy=proxy)

JavaScript
 
configuration = webscraping_ai.Configuration(
                host = "https://api.webscraping.ai",
                api_key = {
                    'api_key': 'my_api_key'
                }
            )
​
​
with webscraping_ai.ApiClient(configuration) as api_client:
# Create an instance of the API class
   api_instance = webscraping_ai.HTMLApi(api_client)
   url_j = url # str | URL of the target page
headers = headers
timeout = 20000 
js = False 
proxy = 'datacenter' 
    
api_response = api_instance.get_html(url_j, headers=headers, timeout=timeout, js=js, proxy=proxy)
​

Advertisement

Answer