Skip to content
Advertisement

How can I add data to BigQuery without problems with rate limits? [closed]

I currently have a system in which I want to send data from that system via a Google Cloud Function to BigQuery (using Python). My idea was to do this in two simple steps:

  1. The system calls a Google Cloud Function and sends JSON data to that function
  2. The Google Cloud Function unpacks the data and stores it inside a Google BigQuery table using the pandas-gbq package.

I thought it was a good idea, until I realized there were rate limits on Google BigQuery regarding adding data (see https://community.microstrategy.com/s/article/Known-Limitations-of-Adding-Data-into-Google-BigQuery?language=en_US for example).

Because I want to use this system to trigger more than a thousand times a day, I think I need a better setup.

What would you recommend me to do?

I already thought of two things:

  1. Using Google Datastore for adding data, but I am not sure whether that has the same kind of rate limits and if I can easily transfer that data to BigQuery
  2. Instead of sending data on a case-by-case basis, collect (for example) 500 calls to a Google Cloud Function. Once the data of 500 cases is received, send those 500 cases in one go to BigQuery. I am not sure whether this will work and if it’s even possible to do this with Google Cloud Functions.

Advertisement

Answer

The mentioned limits only apply to load jobs.

You can use streaming inserts instead, which do not have those limits.

I am not sure if pandas-gbq uses load jobs or streaming inserts under the hood.
Anyway, you can also use the BigQuery Python Client and the method insert_rows_from_dataframe which uses the streaming API.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement