Friday, March 14, 2025

Elastic Cross-Cluster Ops - Replication + Search

Summary

This is the final post in a three-part series configuring Elasticsearch (ES) cross-cluster replication and search.  The first two posts set up two distinct ES clusters: one implemented in Kubernetes (ECK), and the other in Docker.

Cross-cluster Replication

Architecture


Configuration

Networking

The two clusters (West and East) are both implemented in Docker.  Although West is a K8s implementation, the underlying architecture of Kind is in fact, Docker.  Each cluster is in its own Docker network.  For cross-cluster operations to function, we need to configure the linkage between these two Docker networks.  The Docker commands below do just that.

echo -e "\n*** Connect Networks ***"
docker network connect kind east-es01-1
docker network connect east_net kind-control-plane
view raw xc-dockernet.sh hosted with ❤ by GitHub

Remote Cluster Configuration

The West Cluster needs to be added to the East as a remote cluster.  The commands below do that and then wait for the remote configuration to complete.

echo -e "\n*** Activate West as a Remote Cluster on East ***"
WEST_TRANS_IP=$(kubectl get service westcluster-es-transport -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
json=$(jq -nc \
--arg proxy_address "$WEST_TRANS_IP":9300 \
'{
persistent: {
cluster: {
remote: {
west_remote: {
mode: "proxy",
proxy_address: $proxy_address
}
}
}
}
}')
curl -s -k -u "elastic:elastic" -X PUT "https://$EAST_ELASTIC_IP:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d "$json" > /dev/null
REMOTE_STATUS=$(curl -s -k -u "elastic:elastic" "https://$EAST_ELASTIC_IP:9200/_resolve/cluster/west_remote:*" | jq '.west_remote.connected')
while [[ $REMOTE_STATUS != "true" ]]
do
sleep 5
REMOTE_STATUS=$(curl -s -k -u "elastic:elastic" "https://$EAST_ELASTIC_IP:9200/_resolve/cluster/west_remote:*" | jq '.west_remote.connected')
done



East Follower Index

In this scenario, we are setting up a leader index on the West cluster (named west_ccr) and its corresponding follower index on the East (named east_ccr).  This will allow one-way replication from West to East.

echo -e "\n*** Create a Follower Index, east_ccr on East Cluster ***"
json=$(jq -nc \
'{
remote_cluster: "west_remote",
leader_index: "west_ccr",
max_read_request_operation_count: 5120,
max_outstanding_read_requests: 12,
max_read_request_size: "32mb",
max_write_request_operation_count: 5120,
max_write_request_size: "9223372036854775807b",
max_outstanding_write_requests: 9,
max_write_buffer_count: 2147483647,
max_write_buffer_size: "512mb",
max_retry_delay: "500ms",
read_poll_timeout: "1m"
}')
curl -s -k -u "elastic:elastic" -X PUT "https://$EAST_ELASTIC_IP:9200/east_ccr/_ccr/follow" \
-H "Content-Type: application/json" \
-d "$json" > /dev/null




Demo

At this point, true replication of the West index (west_ccr), including mappings (schema) has been accomplished.  This can be verified with a simple Nodejs Elasticsearch client application that is located in the src/javascript directory.

$ node ccr_test.js
*** West CCR ***
{
name: 'Snow Crash',
author: 'Neal Stephenson',
release_date: '1992-06-01',
page_count: 470
}
{
name: 'Revelation Space',
author: 'Alastair Reynolds',
release_date: '2000-03-15',
page_count: 585
}
*** East CCR ***
{
name: 'Snow Crash',
author: 'Neal Stephenson',
release_date: '1992-06-01',
page_count: 470
}
{
name: 'Revelation Space',
author: 'Alastair Reynolds',
release_date: '2000-03-15',
page_count: 585
}
view raw xc-ccrdemo.sh hosted with ❤ by GitHub

Cross-cluster Search

Architecture



Configuration

Remote Cluster Configuration

Similar to the prior exercise of establishing the West cluster as a remote cluster on the East, the East cluster now needs to be configured as a remote cluster on the West.
echo -e "\n*** Activate East as a Remote Cluster on West ***"
json=$(jq -nc \
--arg proxy_address "$EAST_ELASTIC_IP":9300 \
'{
persistent: {
cluster: {
remote: {
east_remote: {
mode: "proxy",
proxy_address: $proxy_address
}
}
}
}
}')
WEST_ELASTIC_IP=$(kubectl get service westcluster-es-http -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -s -k -u "elastic:elastic" -X PUT "https://$WEST_ELASTIC_IP:9200/_cluster/settings" \
-H "Content-Type: application/json" \
-d "$json"






Demo

At this point, everything is in place to execute queries that span the two clusters.  A Python Elasticsearch client script is included in the src/python directory.  That script executes a search against the West ES endpoint that spans both the west_ccs index on West and the east_ccs on East.

from elasticsearch import Elasticsearch
WEST_IP = "172.18.0.4"
client = Elasticsearch(f"https://{WEST_IP}:9200", ca_certs="../../west/west-http-ca.crt", basic_auth=("elastic", "elastic"))
resp = client.search(index=["west_ccs", "east_remote:east_ccs"], query={"range": {"release_date": {"gte": 1985}}})
for hit in resp["hits"]["hits"]:
print(hit["_source"])
view raw 1_ccs_test.py hosted with ❤ by GitHub
{ "index" : { "_index" : "east_ccs" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "east_ccs" } }
{"name": "The Handmaid'"'"'s Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}
view raw 2_east_ccs.json hosted with ❤ by GitHub
{ "index" : { "_index" : "west_ccs" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "west_ccs" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
view raw 3_west_ccs.json hosted with ❤ by GitHub
$ python3 ccs_test.py
{'name': "The Handmaid's Tale", 'author': 'Margaret Atwood', 'release_date': '1985-06-01', 'page_count': 311}
{'name': '1984', 'author': 'George Orwell', 'release_date': '1985-06-01', 'page_count': 328}
view raw 4_results.sh hosted with ❤ by GitHub

Source

Elastic Cross-Cluster Ops - East Cluster Build

Summary

This is Part 2 in the three-part series on Elasticsearch (ES) cross-cluster operations.  This ES cluster will be implemented in Docker.  Like the West Cluster, this configuration yields a single ES node and one Kibana node. 

Architecture



Configuration

Docker-compose

For the most part, I used the reference docker-compose.  I added a bind mount such that I could add the West Cluster's transport CA to the trusted CAs for the East Cluster.


volumes:
- certs:/usr/share/elasticsearch/config/certs
- ${PWD}/../west/west-ca.crt:/usr/share/elasticsearch/config/remote/west-ca.crt
environment:
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt,remote/west-ca.crt

Index

Below a minimal index is built via the REST API.  This index is used in a demonstration of cross-cluster search in the next post.

echo -e "\n*** Create east_ccs index ***"
curl -s -k -u "elastic:elastic" "https://$EAST_ELASTIC_IP:9200/_bulk?pretty" \
-H "Content-Type: application/json" \
-d'
{ "index" : { "_index" : "east_ccs" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "east_ccs" } }
{"name": "The Handmaid'"'"'s Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}
' > /dev/null
view raw xc-eastindex.sh hosted with ❤ by GitHub

Source

Elastic Cross-Cluster Ops - West Cluster Build

Summary

This is Part 1 in a three-part series on a multi-cluster build of Elasticsearch (ES) with cross-cluster replication and search enablement.  This post covers the build of the West cluster which is implemented in Kubernetes.

Architecture

The West cluster is implemented as an Elastic Cloud on Kubernetes (ECK).  I'm using Kind for the Kubernetes environment.  This allows for a self-contained environment suitable for a capable laptop.  Additionally, I use cloud-provider-kind to provide native load-balancer functionality.

Configuration

Kind/Cloud-Provider-Kind


kind create cluster
docker run -d --rm --name cloud-provider-kind --network kind \
-v /var/run/docker.sock:/var/run/docker.sock registry.k8s.io/cloud-provider-kind/cloud-controller-manager:v0.6.0
view raw xc-kind.sh hosted with ❤ by GitHub

ECK Operator


helm repo add elastic https://helm.elastic.co
helm repo update elastic
helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace

Elasticsearch + Kibana


apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: westcluster-es-elastic-user
data:
elastic: ZWxhc3RpYw==
---
apiVersion: v1
kind: Secret
metadata:
name: eck-trial-license
namespace: elastic-system
labels:
license.k8s.elastic.co/type: enterprise_trial
annotations:
elastic.co/eula: accepted
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: westcluster
spec:
version: 8.17.2
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
http:
service:
spec:
type: LoadBalancer
transport:
service:
spec:
type: LoadBalancer
view raw xc-es.yaml hosted with ❤ by GitHub
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
spec:
version: 8.17.2
count: 1
elasticsearchRef:
name: westcluster
http:
service:
spec:
type: LoadBalancer
view raw xc-kibana.yaml hosted with ❤ by GitHub
kubectl get secret westcluster-es-transport-certs-public -o jsonpath='{.data.ca\.crt}' | base64 --decode > west-ca.crt
kubectl get secret westcluster-es-http-certs-public -o jsonpath='{.data.ca\.crt}' | base64 --decode > west-http-ca.crt
WEST_ELASTIC_IP=$(kubectl get service westcluster-es-http -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
WEST_KIBANA_IP=$(kubectl get service kibana-kb-http -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')

Indices

Two minimal indices are created with the REST API. These indices will used in a later post on cross-cluster replication and search.
echo -e "\n*** Create west_ccr index ***"
curl -s -k -u "elastic:elastic" "https://$WEST_ELASTIC_IP:9200/_bulk?pretty" \
-H "Content-Type: application/json" \
-d'
{ "index" : { "_index" : "west_ccr" } }
{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470}
{ "index" : { "_index" : "west_ccr" } }
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585}
' > /dev/null
echo -e "\n*** Create west_ccs index ***"
curl -s -k -u "elastic:elastic" "https://$WEST_ELASTIC_IP:9200/_bulk?pretty" \
-H "Content-Type: application/json" \
-d'
{ "index" : { "_index" : "west_ccs" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "west_ccs" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
' > /dev/null
view raw xc-westidx.sh hosted with ❤ by GitHub

Source





Sunday, March 2, 2025

Geospatial Search with Redis and Apache Pinot

Summary

I'll discuss Redis Enterprise and Apache Pinot setup in Docker for this post. 
  • 3-node Redis environment
  • 4-node Pinot environment
  • 1M synthetic JSON records representing a user's geographic location
  • Equivalent geospatial search queries for both environments

Architecture


Data Generation

Synthetic records are created with a Python function that utilizes the Faker library. They are saved to a file called locations.json.  A snippet of the data generator is below.

def worker(ndocs):
global fake
result = []
for _ in range(ndocs):
uuid = str(uuid4())
coords = fake.local_latlng(country_code="US", coords_only=True)
lat = round(float(coords[0]) + random.uniform(-1, 1), 5) # faker local_latlng generates duplicates. this is a workaround
lng = round(float(coords[1]) + random.uniform(-1, 1), 5)
point_lnglat = f'{lng} {lat}'
point_wkt = Point(lng, lat).wkt
dob = fake.date_of_birth(minimum_age=10, maximum_age=90).year
result.append({"uuid": uuid, "point_lnglat": point_lnglat, "point_wkt": point_wkt, "dob": dob})
return result
view raw datagen.py hosted with ❤ by GitHub

Redis Deployment

Docker

Three Redis nodes are created.

re1:
image: redislabs/redis:latest
container_name: re1
restart: unless-stopped
tty: true
cap_add:
- sys_resource
ports:
- 12000:12000
- 8443:8443
- 9443:9443
networks:
redis-net:
ipv4_address: 192.168.20.2
re2:
image: redislabs/redis:latest
container_name: re2
restart: unless-stopped
tty: true
cap_add:
- sys_resource
networks:
redis-net:
ipv4_address: 192.168.20.3
re3:
image: redislabs/redis:latest
container_name: re3
restart: unless-stopped
tty: true
cap_add:
- sys_resource
networks:
redis-net:
ipv4_address: 192.168.20.4

Cluster Build, DB creation, Index Build, Data Load

Nodes are joined into a Redis cluster. A single-shard database is then created, along with an index. Finally, the database is populated from a JSON file with Riot.

echo -e "\n*** Build Redis Cluster ***"
docker exec -it re1 /opt/redislabs/bin/rladmin cluster create name cluster.local username redis@redis.com password redis
docker exec -it re2 /opt/redislabs/bin/rladmin cluster join nodes 192.168.20.2 username redis@redis.com password redis
docker exec -it re3 /opt/redislabs/bin/rladmin cluster join nodes 192.168.20.2 username redis@redis.com password redis
sleep 1
echo -e "\n*** Build Target Redis DB ***"
curl -s -o /dev/null -k -u "redis@redis.com:redis" https://localhost:9443/v1/bdbs -H "Content-Type:application/json" -d @$PWD/redis/redb.json
echo -e "\n*** Create Redis Index ***"
redis-cli -h localhost -p 12000 FT.CREATE locationidx ON JSON PREFIX 1 location: SCHEMA $.point_lnglat AS point_lnglat GEO SORTABLE $.point_wkt AS point_wkt GEOSHAPE SPHERICAL SORTABLE $.dob AS dob NUMERIC SORTABLE $.uuid AS uuid TAG SORTABLE
echo -e "\n*** Ingest Data to Redis ***"
riot file-import -h localhost -p 12000 --threads 20 $DATA json.set --keyspace location --key uuid
view raw redis-bash.sh hosted with ❤ by GitHub

Redis Insight




Pinot Deployment

Docker

A 4-node Pinot cluster is created in docker-compose.  Pinot Controller definition is below.

pinot-controller:
image: apachepinot/pinot:latest
command: "StartController -zkAddress pinot-zookeeper:2181"
container_name: "pinot-controller"
restart: unless-stopped
ports:
- "9000:9000"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
depends_on:
pinot-zookeeper:
condition: service_healthy
networks:
- pinot-net
volumes:
- $PWD/data:/tmp/pinot/data
- $PWD/pinot:/tmp/pinot/config
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9000/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 10s

Schema/Table Build, Data Load


echo -e "\n*** Build Pinot Schema and Table ***"
docker exec -it pinot-controller /opt/pinot/bin/pinot-admin.sh AddTable -tableConfigFile /tmp/pinot/config/table.json -schemaFile /tmp/pinot/config/schema.json -controllerHost localhost -controllerPort 9000 -exec > /dev/null 2>&1
sleep 1
echo -e "\n*** Ingest Data to Pinot ***"
docker exec -it pinot-controller /opt/pinot/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /tmp/pinot/config/job-spec.yaml > /dev/null 2>&1
view raw pinot-build.sh hosted with ❤ by GitHub

Pinot Console







Queries

The query below finds the count of users within a polygon defined as the boundaries of the State of Colorado.

Redis


$ redis-cli -p 12000 -3 FT.SEARCH locationidx '@point_wkt:[WITHIN $Colorado]' PARAMS 2 Colorado 'POLYGON((-109.0448 37.0004,-102.0424 36.9949,-102.0534 41.0006,-109.0489 40.9996,-109.0448 37.0004,-109.0448 37.0004))' LIMIT 0 0 DIALECT 2
1# attributes => (empty array)
2# warning => (empty array)
3# total_results => (integer) 12500
4# format => STRING
5# results => (empty array)

Pinot


$ curl -s -X POST -H "Content-Type: application/json" -d @pinot/s4.json http://localhost:8099/query/sql
| jq '.resultTable'
{
"dataSchema": {
"columnNames": [
"CO_Total"
],
"columnDataTypes": [
"LONG"
]
},
"rows": [
[
12500
]
]
}

Source