Skip to content

What is the best way to combine multiple ndjson urls from a REST API?

My goal is to extract all of the urls and add a get request to each ndjson file; however, this can be complicated when there are more 10 urls. Is there a better way to do this or do I need to put multiple GET requests in and then join the ndjson files and then parse the data.

print(response.text)

Output:

{"transactionTime":"2022-03-27T08:51:32.174-04:00","request":"https://api.site/data/5555/$export","requiresAccessToken":true,"output": [
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838916.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838917.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838918.ndjson"
        }
    ]
"error":[],"JobID":12443}

list(response.text.values())

Output:

 [
    "1990-01-28T08:51:32.174-04:00",
    "https://api.site/data/5555/$export",
    true,
    [
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838916.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838917.ndjson"
        },
        {
            "type":"robot",
            "url":"https://api.site/data/5555/838918.ndjson"
        }
    ]

I currently add multiple GET requests here:

response1 = requests.get("https://api.site/data/5555/838916.ndjson",headers=headers)
response2 = requests.get("https://api.site/data/5555/838917.ndjson",headers=headers)
response3 = requests.get("https://api.site/data/5555/838918.ndjson",headers=headers)

Advertisement

Answer

If I understood your question correctly, you send some request which returns you provided JSON object. You need to send requests to every url from this object and merge data into a single container (e.g. dict).

from requests import Session

headers = { ... }  # some headers

sess = Session()
sess.headers.update(headers)

resp = sess.get("https://api.site/data/5555/$export")
for item in resp.json()["output"]:
    ndjson = sess.get(item["url"])
    # here some code to process ndjson.text

Normally ndjson is a list of JSON objects separated by newline char, so without actual data it’s not possible to help with code which will store this data in proper (for future parsing) format.