Sunday, March 2, 2025

Geospatial Search with Redis and Apache Pinot

Summary

I'll discuss Redis Enterprise and Apache Pinot setup in Docker for this post. 
  • 3-node Redis environment
  • 4-node Pinot environment
  • 1M synthetic JSON records representing a user's geographic location
  • Equivalent geospatial search queries for both environments

Architecture


Data Generation

Synthetic records are created with a Python function that utilizes the Faker library. They are saved to a file called locations.json.  A snippet of the data generator is below.

def worker(ndocs):
global fake
result = []
for _ in range(ndocs):
uuid = str(uuid4())
coords = fake.local_latlng(country_code="US", coords_only=True)
lat = round(float(coords[0]) + random.uniform(-1, 1), 5) # faker local_latlng generates duplicates. this is a workaround
lng = round(float(coords[1]) + random.uniform(-1, 1), 5)
point_lnglat = f'{lng} {lat}'
point_wkt = Point(lng, lat).wkt
dob = fake.date_of_birth(minimum_age=10, maximum_age=90).year
result.append({"uuid": uuid, "point_lnglat": point_lnglat, "point_wkt": point_wkt, "dob": dob})
return result
view raw datagen.py hosted with ❤ by GitHub

Redis Deployment

Docker

Three Redis nodes are created.

re1:
image: redislabs/redis:latest
container_name: re1
restart: unless-stopped
tty: true
cap_add:
- sys_resource
ports:
- 12000:12000
- 8443:8443
- 9443:9443
networks:
redis-net:
ipv4_address: 192.168.20.2
re2:
image: redislabs/redis:latest
container_name: re2
restart: unless-stopped
tty: true
cap_add:
- sys_resource
networks:
redis-net:
ipv4_address: 192.168.20.3
re3:
image: redislabs/redis:latest
container_name: re3
restart: unless-stopped
tty: true
cap_add:
- sys_resource
networks:
redis-net:
ipv4_address: 192.168.20.4

Cluster Build, DB creation, Index Build, Data Load

Nodes are joined into a Redis cluster. A single-shard database is then created, along with an index. Finally, the database is populated from a JSON file with Riot.

echo -e "\n*** Build Redis Cluster ***"
docker exec -it re1 /opt/redislabs/bin/rladmin cluster create name cluster.local username redis@redis.com password redis
docker exec -it re2 /opt/redislabs/bin/rladmin cluster join nodes 192.168.20.2 username redis@redis.com password redis
docker exec -it re3 /opt/redislabs/bin/rladmin cluster join nodes 192.168.20.2 username redis@redis.com password redis
sleep 1
echo -e "\n*** Build Target Redis DB ***"
curl -s -o /dev/null -k -u "redis@redis.com:redis" https://localhost:9443/v1/bdbs -H "Content-Type:application/json" -d @$PWD/redis/redb.json
echo -e "\n*** Create Redis Index ***"
redis-cli -h localhost -p 12000 FT.CREATE locationidx ON JSON PREFIX 1 location: SCHEMA $.point_lnglat AS point_lnglat GEO SORTABLE $.point_wkt AS point_wkt GEOSHAPE SPHERICAL SORTABLE $.dob AS dob NUMERIC SORTABLE $.uuid AS uuid TAG SORTABLE
echo -e "\n*** Ingest Data to Redis ***"
riot file-import -h localhost -p 12000 --threads 20 $DATA json.set --keyspace location --key uuid
view raw redis-bash.sh hosted with ❤ by GitHub

Redis Insight




Pinot Deployment

Docker

A 4-node Pinot cluster is created in docker-compose.  Pinot Controller definition is below.

pinot-controller:
image: apachepinot/pinot:latest
command: "StartController -zkAddress pinot-zookeeper:2181"
container_name: "pinot-controller"
restart: unless-stopped
ports:
- "9000:9000"
environment:
JAVA_OPTS: "-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
depends_on:
pinot-zookeeper:
condition: service_healthy
networks:
- pinot-net
volumes:
- $PWD/data:/tmp/pinot/data
- $PWD/pinot:/tmp/pinot/config
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9000/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 10s

Schema/Table Build, Data Load


echo -e "\n*** Build Pinot Schema and Table ***"
docker exec -it pinot-controller /opt/pinot/bin/pinot-admin.sh AddTable -tableConfigFile /tmp/pinot/config/table.json -schemaFile /tmp/pinot/config/schema.json -controllerHost localhost -controllerPort 9000 -exec > /dev/null 2>&1
sleep 1
echo -e "\n*** Ingest Data to Pinot ***"
docker exec -it pinot-controller /opt/pinot/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /tmp/pinot/config/job-spec.yaml > /dev/null 2>&1
view raw pinot-build.sh hosted with ❤ by GitHub

Pinot Console







Queries

The query below finds the count of users within a polygon defined as the boundaries of the State of Colorado.

Redis


$ redis-cli -p 12000 -3 FT.SEARCH locationidx '@point_wkt:[WITHIN $Colorado]' PARAMS 2 Colorado 'POLYGON((-109.0448 37.0004,-102.0424 36.9949,-102.0534 41.0006,-109.0489 40.9996,-109.0448 37.0004,-109.0448 37.0004))' LIMIT 0 0 DIALECT 2
1# attributes => (empty array)
2# warning => (empty array)
3# total_results => (integer) 12500
4# format => STRING
5# results => (empty array)

Pinot


$ curl -s -X POST -H "Content-Type: application/json" -d @pinot/s4.json http://localhost:8099/query/sql
| jq '.resultTable'
{
"dataSchema": {
"columnNames": [
"CO_Total"
],
"columnDataTypes": [
"LONG"
]
},
"rows": [
[
12500
]
]
}

Source