Skip to content
Advertisement

What is the best way to combine multiple ndjson urls from a REST API?

My goal is to extract all of the urls and add a get request to each ndjson file; however, this can be complicated when there are more 10 urls. Is there a better way to do this or do I need to put multiple GET requests in and then join the ndjson files and then parse the data.

print(response.text)

Output:

{"transactionTime":"2022-03-27T08:51:32.174-04:00","request":"https://api.site/data/5555/$export","requiresAccessToken":true,"output": [
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838916.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838917.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838918.ndjson"
        }
    ]
"error":[],"JobID":12443}

list(response.text.values())

Output:

 [
    "1990-01-28T08:51:32.174-04:00",
    "https://api.site/data/5555/$export",
    true,
    [
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838916.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838917.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838918.ndjson"
        }
    ]

I currently add multiple GET requests here:

response1 = requests.get("https://api.site/data/5555/838916.ndjson",headers=headers)
response2 = requests.get("https://api.site/data/5555/838917.ndjson",headers=headers)
response3 = requests.get("https://api.site/data/5555/838918.ndjson",headers=headers)

Advertisement

Answer

If I understood your question correctly, you send some request which returns you provided JSON object. You need to send requests to every url from this object and merge data into a single container (e.g. dict).

from requests import Session

headers = { ... }  # some headers

sess = Session()
sess.headers.update(headers)

resp = sess.get("https://api.site/data/5555/$export")
for item in resp.json()["output"]:
    ndjson = sess.get(item["url"])
    # here some code to process ndjson.text

Normally ndjson is a list of JSON objects separated by newline char, so without actual data it’s not possible to help with code which will store this data in proper (for future parsing) format.

Advertisement