Skip to content
Advertisement

How to improve performance of querying data from Salesforce with Python?

Is there any solution we can retrieve Salesforce data from Python with more than 2000 records for each chunk? I have used REST API to retrieve data and check nextRecordsUrl for the next chunk. But if it is a million records, this solution will take time. I tried to find a Salesforce parameter to increase the number of records for each chunk (>2000 records) but didn’t find it yet.

Another idea is if we know how many nextRecordsUrl, we can use multi-threading in Python to retrieve data. But it seems we need to submit each nextRecordsUrl to get the next one.

If you have others ideas, pls suggest them. Currently, I can’t use some filter conditions in SQL to limit data.

Advertisement

Answer

You could look into using Bulk API query, it’d let you return data in 10K chunks. But it comes with a bit of thinking shift. Your normal API is synchronous (give me next chunk, wait, give me next chunk, wait). With Bulk API you submit the job and from time to time you ask “is it done yet”.

There’s even a feature called “PK chunking” (split the results by primary key)

Consider going through the trailhead: https://trailhead.salesforce.com/content/learn/modules/large-data-volumes

And maybe play with Salesforce’s Data Loader. Query your stuff normal way and measure the time, then with bulk api option selected. should give you idea what’s the bottleneck and whether the big rewrite will gain anything.

https://developer.salesforce.com/docs/atlas.en-us.230.0.api_asynch.meta/api_asynch/asynch_api_bulk_query_intro.htm

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement