Friday, March 14, 2025

Elastic Cross-Cluster Ops - Replication + Search

Summary

This is the final post in a three-part series configuring Elasticsearch (ES) cross-cluster replication and search.  The first two posts set up two distinct ES clusters: one implemented in Kubernetes (ECK), and the other in Docker.

Cross-cluster Replication

Architecture


Configuration

Networking

The two clusters (West and East) are both implemented in Docker.  Although West is a K8s implementation, the underlying architecture of Kind is in fact, Docker.  Each cluster is in its own Docker network.  For cross-cluster operations to function, we need to configure the linkage between these two Docker networks.  The Docker commands below do just that.

echo -e "\n*** Connect Networks ***"
docker network connect kind east-es01-1
docker network connect east_net kind-control-plane
view raw xc-dockernet.sh hosted with ❤ by GitHub

Remote Cluster Configuration

The West Cluster needs to be added to the East as a remote cluster.  The commands below do that and then wait for the remote configuration to complete.

echo -e "\n*** Activate West as a Remote Cluster on East ***"
WEST_TRANS_IP=$(kubectl get service westcluster-es-transport -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
json=$(jq -nc \
--arg proxy_address "$WEST_TRANS_IP":9300 \
'{
persistent: {
cluster: {
remote: {
west_remote: {
mode: "proxy",
proxy_address: $proxy_address
}
}
}
}
}')
curl -s -k -u "elastic:elastic" -X PUT "https://$EAST_ELASTIC_IP:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d "$json" > /dev/null
REMOTE_STATUS=$(curl -s -k -u "elastic:elastic" "https://$EAST_ELASTIC_IP:9200/_resolve/cluster/west_remote:*" | jq '.west_remote.connected')
while [[ $REMOTE_STATUS != "true" ]]
do
sleep 5
REMOTE_STATUS=$(curl -s -k -u "elastic:elastic" "https://$EAST_ELASTIC_IP:9200/_resolve/cluster/west_remote:*" | jq '.west_remote.connected')
done



East Follower Index

In this scenario, we are setting up a leader index on the West cluster (named west_ccr) and its corresponding follower index on the East (named east_ccr).  This will allow one-way replication from West to East.

echo -e "\n*** Create a Follower Index, east_ccr on East Cluster ***"
json=$(jq -nc \
'{
remote_cluster: "west_remote",
leader_index: "west_ccr",
max_read_request_operation_count: 5120,
max_outstanding_read_requests: 12,
max_read_request_size: "32mb",
max_write_request_operation_count: 5120,
max_write_request_size: "9223372036854775807b",
max_outstanding_write_requests: 9,
max_write_buffer_count: 2147483647,
max_write_buffer_size: "512mb",
max_retry_delay: "500ms",
read_poll_timeout: "1m"
}')
curl -s -k -u "elastic:elastic" -X PUT "https://$EAST_ELASTIC_IP:9200/east_ccr/_ccr/follow" \
-H "Content-Type: application/json" \
-d "$json" > /dev/null




Demo

At this point, true replication of the West index (west_ccr), including mappings (schema) has been accomplished.  This can be verified with a simple Nodejs Elasticsearch client application that is located in the src/javascript directory.

$ node ccr_test.js
*** West CCR ***
{
name: 'Snow Crash',
author: 'Neal Stephenson',
release_date: '1992-06-01',
page_count: 470
}
{
name: 'Revelation Space',
author: 'Alastair Reynolds',
release_date: '2000-03-15',
page_count: 585
}
*** East CCR ***
{
name: 'Snow Crash',
author: 'Neal Stephenson',
release_date: '1992-06-01',
page_count: 470
}
{
name: 'Revelation Space',
author: 'Alastair Reynolds',
release_date: '2000-03-15',
page_count: 585
}
view raw xc-ccrdemo.sh hosted with ❤ by GitHub

Cross-cluster Search

Architecture



Configuration

Remote Cluster Configuration

Similar to the prior exercise of establishing the West cluster as a remote cluster on the East, the East cluster now needs to be configured as a remote cluster on the West.
echo -e "\n*** Activate East as a Remote Cluster on West ***"
json=$(jq -nc \
--arg proxy_address "$EAST_ELASTIC_IP":9300 \
'{
persistent: {
cluster: {
remote: {
east_remote: {
mode: "proxy",
proxy_address: $proxy_address
}
}
}
}
}')
WEST_ELASTIC_IP=$(kubectl get service westcluster-es-http -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -s -k -u "elastic:elastic" -X PUT "https://$WEST_ELASTIC_IP:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d "$json"






Demo

At this point, everything is in place to execute queries that span the two clusters.  A Python Elasticsearch client script is included in the src/python directory.  That script executes a search against the West ES endpoint that spans both the west_ccs index on West and the east_ccs on East.

from elasticsearch import Elasticsearch
WEST_IP = "172.18.0.4"
client = Elasticsearch(f"https://{WEST_IP}:9200", ca_certs="../../west/west-http-ca.crt", basic_auth=("elastic", "elastic"))
resp = client.search(index=["west_ccs", "east_remote:east_ccs"], query={"range": {"release_date": {"gte": 1985}}})
for hit in resp["hits"]["hits"]:
print(hit["_source"])
view raw 1_ccs_test.py hosted with ❤ by GitHub
{ "index" : { "_index" : "east_ccs" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "east_ccs" } }
{"name": "The Handmaid'"'"'s Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}
view raw 2_east_ccs.json hosted with ❤ by GitHub
{ "index" : { "_index" : "west_ccs" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "west_ccs" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
view raw 3_west_ccs.json hosted with ❤ by GitHub
$ python3 ccs_test.py
{'name': "The Handmaid's Tale", 'author': 'Margaret Atwood', 'release_date': '1985-06-01', 'page_count': 311}
{'name': '1984', 'author': 'George Orwell', 'release_date': '1985-06-01', 'page_count': 328}
view raw 4_results.sh hosted with ❤ by GitHub

Source