I have a list of json gz https files
FYI: these files are not real files due to privacy laws but mimic the exact structure.
list_of_files = ['https://premera.saph.com/202011/json.gz', 'https://premera.saph.com/202011/json.gz']
My goal is to combine all these json gz files into one large json gz file.
I’ve tried numerous ways to do this by referencing other Stack Overflow questions; however, I am unable to find exactly what I am looking for.
This comment helped me somewhat, but in my situation, I believe that I need to add requests to get the file since it is an http.
Python 3, read/write compressed json objects from/to gzip file
import requests import gzip one_file = file[0] with open(one_file, 'rb') as f: serial = gzip.decompress(f.read())
Error:
OSError: [Errno 22] Invalid argument: 'https://premera.saph.com/202011/json.gz'
Got this error on the correct https since this is changed for privacy.
Advertisement
Answer
This comment helped me somewhat, but in my situation, I believe that I need to add requests to get the file since it is an http.
Indeed built-in open
function does not support HTTP access, however in this case I would use urllib.request.urlopen
, consider following example using example file provided by Mozilla
import json import gzip import urllib.request url = "https://wiki.mozilla.org/images/f/ff/Example.json.gz" with urllib.request.urlopen(url) as gzf: with gzip.open(gzf) as jsonf: data = json.load(jsonf) print(data)
gives output
{'InstallTime': '1295768962', 'Comments': 'Will test without extension.', 'Theme': 'classic/1.0', 'Version': '4.0b10pre', 'id': 'ec8030f7-c20a-464f-9b0e-13a3a9e97384', 'Vendor': 'Mozilla', 'EMCheckCompatibility': 'false', 'Throttleable': '1', 'Email': 'deinspanjer@mozilla.com', 'URL': 'http://nighthacks.com/roller/jag/entry/the_shit_finally_hits_the', 'version': '4.0b10pre', 'CrashTime': '1295903735', 'ReleaseChannel': 'nightly', 'submitted_timestamp': '2011-01-24T13:15:48.550858', 'buildid': '20110121153230', 'timestamp': 1295903748.551002, 'Notes': 'Renderers: 0x22600,0x22600,0x20400', 'StartupTime': '1295768964', 'FramePoisonSize': '4096', 'FramePoisonBase': '7ffffffff0dea000', 'AdapterRendererIDs': '0x22600,0x22600,0x20400', 'Add-ons': 'compatibility@addons.mozilla.org:0.7,enter.selects@agadak.net:6,{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}:1.3.3,sts-ui@sidstamm.com:0.1,masspasswordreset@johnathan.nightingale:1.04,support@lastpass.com:1.72.0,{972ce4c6-7e08-4474-a285-3208198ce6fd}:4.0b10pre', 'BuildID': '20110121153230', 'SecondsSinceLastCrash': '810473', 'ProductName': 'Firefox', 'legacy_processing': 0}
Explanation: first with does open file under specified URL then gzip.open
is used to decompress is, so json.load
can be used to parse JSON and get data (data is dict
). Note that all used import
s pertain to standard library, so you do not need to install any external package.