Sunday, November 19, 2023

DICOM Image Caching with Redis

 Summary

This post covers a demonstration of the usage of Redis for caching DICOM imagery.  I use a Jupyter Notebook to step through loading and searching DICOM images in a Redis Enterprise environment.

Architecture




Redis Enterprise Environment

Screen-shot below of the resulting environment in Docker.



Sample DICOM Image

I use a portion of sample images included with the Pydicom lib.  Below is an example:


Code Snippets

Data Load

The code below loops through the Pydicom-included DICOM files.  Those that contain the meta-data that is going to be subsequently used for some search scenarios are broken up into 5 KB chunks and stored as Redis Strings.  Those chunks and the meta-data are then saved to a Redis JSON object.  The chunks' Redis key names are stored as an array in that JSON object.

def load_chunks(key, file, chunk_size):
i = 0
chunk_keys = []
with open(file, 'rb') as infile:
while chunk := infile.read(chunk_size):
chunk_key = f'chunk:{key}:{i}'
client.set(chunk_key, chunk)
chunk_keys.append(chunk_key)
i += 1
return chunk_keys
count = 0
pydicom.config.settings.reading_validation_mode = pydicom.config.RAISE
for file in pydicom.data.get_testdata_files():
try:
ds = pydicom.dcmread(file)
key = f'file:{os.path.basename(file)}'
image_name = os.path.basename(file)
protocol_name = re.sub(r'\s+', ' ', ds.ProtocolName)
patient_sex = ds.PatientSex
study_date = ds.StudyDate
manufacturer = ds.Manufacturer.upper()
chunk_keys = load_chunks(key, file, CHUNK_SIZE)
client.json().set(key, '$', {
'imageName': image_name,
'protocolName': protocol_name,
'patientSex': patient_sex,
'studyDate': study_date,
'manufacturer': manufacturer,
'chunks': chunk_keys
})
count += 1
except:
pass
print(f'Files loaded: {count}')

Search Scenario 1

This code retrieves all the byte chunks for a DICOM image where the Redis key is known.  Strictly, speaking this isn't a 'search'.  I'm simply performing a JSON GET for a key name.

file_name = 'JPGExtended.dcm'
t1 = perf_counter()
results = client.json().get(f'file:{file_name}', '$.chunks')
total_bytes = get_bytes(results[0])
t2 = perf_counter()
print(f'Exec time: {round((t2-t1)*1000,2)} ms')
print(f'Bytes Retrieved: {len(total_bytes)}')

Search Scenario 2

The code below demonstrates how to put together a Redis Search on the image meta-data.  In this case, we're looking for a DICOM image with a protocolName of 194 and studyDate in 2019.

query = Query('@protocolName:194 @studyDate:{2019*}')\
.return_field('$.chunks', as_field='chunks')\
.return_field('$.imageName', as_field='imageName')
t1 = perf_counter()
result = client.ft('dicom_idx').search(query)
total_bytes = bytearray()
if len(result.docs) > 0:
total_bytes = get_bytes(json.loads(result.docs[0].chunks))
t2 = perf_counter()
print(f'Exec time: {round((t2-t1)*1000,2)} ms')
print(f'Image name: {result.docs[0].imageName}')
print(f'Bytes Retrieved: {len(total_bytes)}')

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.

Sunday, November 5, 2023

Redis Vector Database Sizing Tool

Summary

In this post, I cover a utility I wrote for observing Redis vector data and index sizes with varying data types and index parameters.  The tool creates a single-node, single-shard Redis Enterprise database with the Search and JSON modules enabled.

Code Snippets

Constants and Enums



REDIS_URL: str = 'redis://default:redis@localhost:12000'
NUM_KEYS: int = 100000
connection: Connection = None
class OBJECT_TYPE(Enum):
HASH = 'hash'
JSON = 'json'
class INDEX_TYPE(Enum):
FLAT = 'flat'
HNSW = 'hnsw'
class METRIC_TYPE(Enum):
L2 = 'l2'
IP = 'ip'
COSINE = 'cosine'
class FLOAT_TYPE(Enum):
F32 = 'float32'
F64 = 'float64'

Redis Index Build and Data Load


global connection
num_workers: int = cpu_count() - 1 #number of worker processes for data loading
keys: list[int] = [self.num_keys // num_workers for i in range(num_workers)] #number of keys each worker will generate
keys[0] += self.num_keys % num_workers
connection.flushdb()
sleep(5) #wait for counters to reset
base_ram: float = round(connection.info('memory')['used_memory']/1048576, 2) # 'empty' Redis db memory usage
vec_params: dict = {
"TYPE": self.float_type.value,
"DIM": self.vec_dim,
"DISTANCE_METRIC": self.metric_type.value,
}
if self.index_type is INDEX_TYPE.HNSW:
vec_params['M'] = self.vec_m
match self.object_type:
case OBJECT_TYPE.JSON:
schema = [ VectorField('$.vector', self.index_type.value, vec_params, as_name='vector')]
idx_def: IndexDefinition = IndexDefinition(index_type=IndexType.JSON, prefix=['key:'])
case OBJECT_TYPE.HASH:
schema = [ VectorField('vector', self.index_type.value, vec_params)]
idx_def: IndexDefinition = IndexDefinition(index_type=IndexType.HASH, prefix=['key:'])
connection.ft('idx').create_index(schema, definition=idx_def)
pool_params = zip(keys, repeat(self.object_type), repeat(self.vec_dim), repeat(self.float_type))
t1_start: float = perf_counter()
with Pool(cpu_count()) as pool:
pool.starmap(load_db, pool_params) # load a Redis instance via a pool of worker processes
t1_stop:float = perf_counter()

Sample Results


python3 vss-sizer.py --nkeys 100000 --objecttype hash --indextype flat --metrictype cosine --floattype f32 --vecdim 1536
Vector Index Test
*** Parameters ***
nkeys: 100000
objecttype: hash
indextype: flat
metrictype: cosine
floattype: float32
vecdim: 1536
*** Results ***
index ram used: 606.75 MB
data ram used: 808.52 MB
index to data ratio: 75.04%
document size: 7376 B
execution time: 2.3 sec

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.

Redis Search - Rental Availability

Summary

This post covers a very specific use case of Redis in the short-term rental domain.  Specifically, Redis is used to find property availability in a given geographic area and date/time slot.

Architecture


Code Snippets

Data Load

The code below loads rental properties as Redis JSON objects and US Postal ZIP codes with their associated lat/longs as Redis strings.

async #insertProperties() {
const csvStream = fs.createReadStream("./data/co.csv").pipe(parse({ delimiter: ",", from_line: 2}));
let id = 1;
for await (const row of csvStream) {
const doc = {
"id": id,
"address": {
"coords": `${row[0]} ${row[1]}`,
"number": row[2],
"street": row[3],
"unit": row[4],
"city": row[5],
"state": "CO",
"postcode": row[8]
},
"owner": {
"fname": uniqueNamesGenerator({dictionaries: [names], style: 'capital', length: 1, separator: ' '}),
"lname": uniqueNamesGenerator({dictionaries: [names], style: 'capital', length: 1, separator: ' '}),
},
"type": `${TYPES[Math.floor(Math.random() * TYPES.length)]}`,
"availability": this.#getAvailability(),
"rate": Math.round((Math.random() * 250 + 125) * 100) / 100
}
await this.client.json.set(`property:${id}`, '.', doc);
id++;
if (id > MAX_PROPERTIES) {
break;
}
}
async #insertZips() {
const csvStream = fs.createReadStream("./data/zip_lat_long.csv").pipe(parse({ delimiter: ",", from_line: 2}));
for await (const row of csvStream) {
const zip = row[0];
const lat = row[1];
const lon = row[2];
await this.client.set(`zip:${zip}`, `${lon} ${lat}`);
}
}
view raw rental-load.js hosted with ❤ by GitHub

Property Search

The code below represents an Expressjs route for performing searches on the Redis properties.  The search is performed on rental property type and geographic distance from a given location.

app.post('/property/search', async (req, res) => {
const { type, zip, radius, begin, end } = req.body;
console.log(`app - POST /property/search ${JSON.stringify(req.body)}`);
try {
const loc = await client.get(`zip:${zip}`);
if (!loc) {
throw new Error('Zip code not found');
}
const query = `@type:{${type}} @coords:[${loc} ${radius} mi]`;
const docs = await client.ft.aggregate('propIdx', query,
{
DIALECT: 3,
LOAD: [
'@__key',
{ identifier: `$.availability[?(@.begin<=${begin} && @.end>=${end})]`,
AS: 'match'
}
],
STEPS: [
{ type: AggregateSteps.FILTER,
expression: 'exists(@match)'
},
{
type: AggregateSteps.SORTBY,
BY: {
BY: '@rate',
DIRECTION: 'ASC'
}
},
{
type: AggregateSteps.LIMIT,
from: 0,
size: 3
}
]
});
if (docs && docs.results) {
let properties = [];
for (const result of docs.results) {
const rental_date = JSON.parse(result.match);
const property = {
"key": result.__key,
"rate": result.rate,
"begin": rental_date[0].begin,
"end": rental_date[0].end
};
properties.push(property);
}
console.log(`app - POST /property/search - properties found: ${properties.length}`);
res.status(200).json(properties);
}
else {
console.log('app - POST /property/search - no properties found');
res.status(401).send('No properties found');
}
}
catch (err) {
console.log(`app - POST /property/search - error: ${err.message}`);
res.status(400).json({ 'error': err.message });
}
});

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.