I am using the code below to add data to Elasticsearch:
from elasticsearch import Elasticsearch es = Elasticsearch() es.cluster.health() records = [ {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'} ] es.indices.create(index='my-index_1', ignore=400) for record in records: # es.indices.update(index="my-index_1", body=record) es.index(index="my-index_1", body=record) # Retrieve the data es.search(index='my-index_1')['hits']['hits']
But how do I update the document?
records = [ {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'} ]
Here Dr. Messi, Dr. Christiano
has to update the index and Dr. Bernard M. Aaron
should not update as it is already present in the index.
Advertisement
Answer
In Elasticsearch, when data is indexed without providing a custom ID, then a new ID will be created by Elasticsearch for every document you index.
Hence, since you are not providing an ID, Elasticsearch generates it automatically.
But you also want to check if Name
already exists. There are two approaches:
- Index the data without passing an
_id
for every document. After this you will have to search using theName
field to see if the document exists. - Index the data with your own
_id
for each document. Then search with_id
.
I’m going to demonstrate the second approach of creating our own IDs. Since you are searching on the Name
field, I’ll hash it using MD5 to generate the _id
. (Any hash function could work.)
First Indexing Data:
import hashlib from elasticsearch import Elasticsearch es = Elasticsearch() es.cluster.health() records = [ {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'} ] index_name="my-index_1" es.indices.create(index=index_name, ignore=400) for record in records: #es.indices.update(index="my-index_1", body=record) es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())
Output:
[{'_index': 'my-index_1', '_type': '_doc', '_id': '1164c423bc4e2fcb75697c3031af9ef1', '_score': 1.0, '_source': {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': '672ae14197a135c39eab759be8b0597f', '_score': 1.0, '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': '85702447f9e9ea010054eaf0555ce79c', '_score': 1.0, '_source': {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}}]
Next Step: Indexing new data
records = [ {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}, {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}] for record in records: try: es.get(index=index_name, id=hashlib.md5(record['Name'].encode()).hexdigest()) except NotFoundError: print("Record Not found") es.index(index=index_name, body=record,id=hashlib.md5(record['Name'].encode()).hexdigest())
Output:
[{'_index': 'my-index_1', '_type': '_doc', '_id': '1164c423bc4e2fcb75697c3031af9ef1', '_score': 1.0, '_source': {'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': '672ae14197a135c39eab759be8b0597f', '_score': 1.0, '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': '85702447f9e9ea010054eaf0555ce79c', '_score': 1.0, '_source': {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': 'e2e0f463145568471097ff027b18b40d', '_score': 1.0, '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}}, {'_index': 'my-index_1', '_type': '_doc', '_id': '23bb4f1a3a41efe7f4cab8a80d766708', '_score': 1.0, '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]
As you can see Dr. Bernard M. Aaron
record is not indexed as it’s already present