Sunday, November 5, 2023

Redis Vector Database Sizing Tool

Summary

In this post, I cover a utility I wrote for observing Redis vector data and index sizes with varying data types and index parameters.  The tool creates a single-node, single-shard Redis Enterprise database with the Search and JSON modules enabled.

Code Snippets

Constants and Enums



REDIS_URL: str = 'redis://default:redis@localhost:12000'
NUM_KEYS: int = 100000
connection: Connection = None
class OBJECT_TYPE(Enum):
HASH = 'hash'
JSON = 'json'
class INDEX_TYPE(Enum):
FLAT = 'flat'
HNSW = 'hnsw'
class METRIC_TYPE(Enum):
L2 = 'l2'
IP = 'ip'
COSINE = 'cosine'
class FLOAT_TYPE(Enum):
F32 = 'float32'
F64 = 'float64'

Redis Index Build and Data Load


global connection
num_workers: int = cpu_count() - 1 #number of worker processes for data loading
keys: list[int] = [self.num_keys // num_workers for i in range(num_workers)] #number of keys each worker will generate
keys[0] += self.num_keys % num_workers
connection.flushdb()
sleep(5) #wait for counters to reset
base_ram: float = round(connection.info('memory')['used_memory']/1048576, 2) # 'empty' Redis db memory usage
vec_params: dict = {
"TYPE": self.float_type.value,
"DIM": self.vec_dim,
"DISTANCE_METRIC": self.metric_type.value,
}
if self.index_type is INDEX_TYPE.HNSW:
vec_params['M'] = self.vec_m
match self.object_type:
case OBJECT_TYPE.JSON:
schema = [ VectorField('$.vector', self.index_type.value, vec_params, as_name='vector')]
idx_def: IndexDefinition = IndexDefinition(index_type=IndexType.JSON, prefix=['key:'])
case OBJECT_TYPE.HASH:
schema = [ VectorField('vector', self.index_type.value, vec_params)]
idx_def: IndexDefinition = IndexDefinition(index_type=IndexType.HASH, prefix=['key:'])
connection.ft('idx').create_index(schema, definition=idx_def)
pool_params = zip(keys, repeat(self.object_type), repeat(self.vec_dim), repeat(self.float_type))
t1_start: float = perf_counter()
with Pool(cpu_count()) as pool:
pool.starmap(load_db, pool_params) # load a Redis instance via a pool of worker processes
t1_stop:float = perf_counter()

Sample Results


python3 vss-sizer.py --nkeys 100000 --objecttype hash --indextype flat --metrictype cosine --floattype f32 --vecdim 1536
Vector Index Test
*** Parameters ***
nkeys: 100000
objecttype: hash
indextype: flat
metrictype: cosine
floattype: float32
vecdim: 1536
*** Results ***
index ram used: 606.75 MB
data ram used: 808.52 MB
index to data ratio: 75.04%
document size: 7376 B
execution time: 2.3 sec

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.