Skip to content
Advertisement

Is there a way to use tqdm (progress bar) with ElasticSearch bulk upload?

As the heading states, I’m looking for a nice visual way to check my ES client upload

I can either use:

for i in tqdm(<my_docs>):
    es_client.create(...)

but I want to use the recommended (by ES) way:

helpers.bulk(...) <- how to add tqdm here?

Advertisement

Answer

Yes, but instead of using bulk, you need to use streaming_bulk. Unlike bulk, which only returns the final result in the end, streaming_bulk yields results per action. With this, we can update tqdm after each action.

The code looks more or less like this:

# Setup the client
client = Elasticsearch()

# Set total number of documents
number_of_docs = 100

progress = tqdm.tqdm(unit="docs", total=number_of_docs)
successes = 0

for ok, action in streaming_bulk(
    client=client, index="my-index", actions=<your_generator_here>
):
    progress.update(1)
    successes += ok

print(f"Indexed {successes}/{number_of_docs} documents")
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement