Skip to content
Advertisement

How to update the Elasticsearch document with Python?

I am using the code below to add data to Elasticsearch:

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
records = [
    {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}
]
es.indices.create(index='my-index_1', ignore=400)
    
for record in records:
    # es.indices.update(index="my-index_1", body=record)
    es.index(index="my-index_1", body=record)
    
    # Retrieve the data
    es.search(index='my-index_1')['hits']['hits']

But how do I update the document?

records = [
    {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}
]

Here Dr. Messi, Dr. Christiano has to update the index and Dr. Bernard M. Aaron should not update as it is already present in the index.

Advertisement

Answer

In Elasticsearch, when data is indexed without providing a custom ID, then a new ID will be created by Elasticsearch for every document you index.

Hence, since you are not providing an ID, Elasticsearch generates it automatically.

But you also want to check if Name already exists. There are two approaches:

  1. Index the data without passing an _id for every document. After this you will have to search using the Name field to see if the document exists.
  2. Index the data with your own _id for each document. Then search with _id.

I’m going to demonstrate the second approach of creating our own IDs. Since you are searching on the Name field, I’ll hash it using MD5 to generate the _id. (Any hash function could work.)

First Indexing Data:

import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
records = [
    {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}
]

index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)

for record in records:
    #es.indices.update(index="my-index_1", body=record)
    es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())

Output:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]

Next Step: Indexing new data

records = [
    {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
    {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

for record in records:
    try:
        es.get(index=index_name, id=hashlib.md5(record['Name'].encode()).hexdigest())
    except NotFoundError:
        print("Record Not found")
        es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())

Output:

[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': 'e2e0f463145568471097ff027b18b40d',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '23bb4f1a3a41efe7f4cab8a80d766708',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]

As you can see Dr. Bernard M. Aaron record is not indexed as it’s already present

Advertisement