Monday, May 29, 2023

OpenAI Q&A using Redis VSS for Context

Summary

I'll be covering the use case of providing supplemental context to OpenAI in a question/answer scenario (ChatGPT).  Various news articles will be vectorized and stored in Redis.  For a given question that lies outside of ChatGPT's knowledge, additional context will be fetched from Redis via Vector Similarity Search (VSS).   That context will aid ChatGPT in providing a more accurate answer.

Architecture


Code Snippets

OpenAI Prompt/Collect Helper Function


The code below is a simple function for sending a prompt into ChatGPT and then extracting the resulting response.
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
def get_completion(prompt, model="gpt-3.5-turbo"):
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0,
)
return response.choices[0].message["content"]

OpenAI QnA Example 1


The prompt below is on a topic (FTX meltdown) that is outside of ChatGPT's training cut-off date. As a result, the response is of poor quality (wrong).
prompt = "Is Sam Bankman-Fried's company, FTX, considered a well-managed company?"
response = get_completion(prompt)
print(response)
As an AI language model, I cannot provide a personal opinion. However, FTX has been recognized as one of the fastest-growing cryptocurrency exchanges and has received positive reviews for its user-friendly interface, low fees, and innovative products. Additionally, Sam Bankman-Fried has been praised for his leadership and strategic decision-making, including FTX's recent acquisition of Blockfolio. Overall, FTX appears to be a well-managed company.
view raw result.txt hosted with ❤ by GitHub

Redis Context Index Build


The code below uses Redis-py client lib to build an index for business article content in Redis. The index has two fields in its schema: the text content itself and a vector representing the embedding of that text content.
from redis.commands.search.field import TextField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
schema = [ VectorField('$.vector',
"FLAT",
{ "TYPE": 'FLOAT32',
"DIM": 1536,
"DISTANCE_METRIC": "COSINE"
}, as_name='vector' ),
TextField('$.content', as_name='content')
]
idx_def = IndexDefinition(index_type=IndexType.JSON, prefix=['doc:'])
try:
client.ft('idx').dropindex()
except:
pass
client.ft('idx').create_index(schema, definition=idx_def)

Context Storage as Redis JSON


The code below loads up a dozen different business articles into Redis as JSON objects.
import os
import openai
directory = './assets/'
model='text-embedding-ada-002'
i = 1
for file in os.listdir(directory):
with open(os.path.join(directory, file)) as f:
content = f.read()
vector = openai.Embedding.create(input = [content], model = model)['data'][0]['embedding']
client.json().set(f'doc:{i}', '$', {'content': content, 'vector': vector})
i += 1

RedisInsight



Redis Vector Search (KNN)


A vector search in Redis is depicted below. This particular query picks the #1 article as far as vector distance to a given question (prompt).
from redis.commands.search.query import Query
import numpy as np
vec = np.array(openai.Embedding.create(input = [prompt], model = model)['data'][0]['embedding'], dtype=np.float32).tobytes()
q = Query('*=>[KNN 1 @vector $query_vec AS vector_score]')\
.sort_by('vector_score')\
.return_fields('content')\
.dialect(2)
params = {"query_vec": vec}
context = client.ft('idx').search(q, query_params=params).docs[0].content
print(context)
True
b'OK'
Embattled Crypto Exchange FTX Files for Bankruptcy
Nov. 11, 2022
On Monday, Sam Bankman-Fried, the chief executive of the cryptocurrency exchange FTX, took to Twitter to reassure his customers: “FTX is fine,” he wrote. “Assets are fine.”
On Friday, FTX announced that it was filing for bankruptcy, capping an extraordinary week of corporate drama that has upended crypto markets, sent shock waves through an industry struggling to gain mainstream credibility and sparked government investigations that could lead to more damaging revelations or even criminal charges.
In a statement on Twitter, the company said that Mr. Bankman-Fried had resigned, with John J. Ray III, a corporate turnaround specialist, taking over as chief executive.
The speed of FTX’s downfall has left crypto insiders stunned. Just days ago, Mr. Bankman-Fried was considered one of the smartest leaders in the crypto industry, an influential figure in Washington who was lobbying to shape regulations. And FTX was widely viewed as one of the most stable and responsible companies in the freewheeling, loosely regulated crypto industry.
“Here we are, with one of the richest people in the world, his net worth dropping to zero, his business dropping to zero,” said Jared Ellias, a bankruptcy professor at Harvard Law School. “The velocity of this failure is just unbelievable.”
Now, the bankruptcy has set up a rush among investors and customers to salvage funds from what remains of FTX. A surge of customers tried to withdraw funds from the platform this week, and the company couldn’t meet the demand. The exchange owes as much as $8 billion, according to people familiar with its finances.
FTX’s collapse has destabilized the crypto industry, which was already reeling from a crash in the spring that drained $1 trillion from the market. The prices of the leading cryptocurrencies, Bitcoin and Ether, have plummeted. The crypto lender BlockFi, which was closely entangled with FTX, announced on Thursday that it was suspending operations as a result of FTX’s collapse.
Mr. Bankman-Fried was backed by some of the highest-profile venture capital investors in Silicon Valley, including Sequoia Capital and Lightspeed Venture Partners. Some of those investors, facing questions about how closely they scrutinized FTX before they put money into it, have said that their nine-figure investments in the crypto exchange are now essentially worthless.
The company’s demise has also set off a reckoning over risky practices that have become pervasive in crypto, an industry that was founded partly as a corrective to the type of dangerous financial engineering that caused the 2008 economic crisis.
“I’m really sorry, again, that we ended up here,” Mr. Bankman-Fried said on Twitter on Friday. “Hopefully this can bring some amount of transparency, trust, and governance.”
The bankruptcy filing marks the start of what will probably be months or even years of legal fallout, as lawyers try to work out whether the exchange can ever continue to operate in some form and customers demand compensation. FTX is already the target of investigations by the Securities and Exchange Commission and the Justice Department, with investigators focused on whether the company improperly used customer funds to prop up Alameda Research, a trading firm that Mr. Bankman-Fried also founded.
...
Not long ago, Mr. Bankman-Fried was performing a comedy routine onstage at a conference with Anthony Scaramucci, the former White House communications director and a business partner of FTX.
“I’m disappointed,” Mr. Scaramucci said in an interview on CNBC on Friday. “Duped, I guess, is the right word.”
view raw result.txt hosted with ❤ by GitHub

Reprompt ChatGPT with Redis-fetched Context


The context fetched in the previous step is now added as supplemental info to ChatGPT for the same FTX-related question. The response is now in line with expectations.
prompt = f"""
Using the information delimited by triple backticks, answer this question: Is Sam Bankman-Fried's company, FTX, considered a well-managed company?
Context: ```{context}```
"""
response = get_completion(prompt)
print(response)
No, Sam Bankman-Fried's company FTX is not considered a well-managed company as it has filed for bankruptcy and owes as much as $8 billion to its creditors. The collapse of FTX has destabilized the crypto industry, and the company is already the target of investigations by the Securities and Exchange Commission and the Justice Department. FTX was widely viewed as one of the most stable and responsible companies in the freewheeling, loosely regulated crypto industry, but its risky practices have become pervasive in crypto, leading to a reckoning.
view raw result.txt hosted with ❤ by GitHub

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.

OpenAI + Redis VSS w/JSON

Summary

This post will cover an example of how to use Redis Vector Similarity Search (VSS) capabilities with OpenAI as the embedding engine.  Documents will be stored as JSON objects within Redis and then searched via VSS via KNN and Hybrid queries.

Architecture

Code Snippets

OpenAI Embedding


def get_vector(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return openai.Embedding.create(input = [text], model = model)['data'][0]['embedding']
text_1 = """Japan narrowly escapes recession
Japan's economy teetered on the brink of a technical recession in the three months to September, figures show.
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quarter. On an annual basis, the data suggests annual growth of just 0.2%, suggesting a much more hesitant recovery than had previously been thought. A common technical definition of a recession is two successive quarters of negative growth.
The government was keen to play down the worrying implications of the data. "I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully," said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. "It's painting a picture of a recovery... much patchier than previously thought," said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.
"""
doc_1 = {"content": text_1, "vector": get_vector(text_1)}

Redis Index Creation


from redis.commands.search.field import TextField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
schema = [ VectorField('$.vector',
"FLAT",
{ "TYPE": 'FLOAT32',
"DIM": len(doc_1['vector']),
"DISTANCE_METRIC": "COSINE"
}, as_name='vector' ),
TextField('$.content', as_name='content')
]
idx_def = IndexDefinition(index_type=IndexType.JSON, prefix=['doc:'])
try:
client.ft('idx').dropindex()
except:
pass
client.ft('idx').create_index(schema, definition=idx_def)

Redis JSON Document Insertion


client.json().set('doc:1', '$', doc_1)
client.json().set('doc:2', '$', doc_2)
client.json().set('doc:3', '$', doc_3)
view raw oaijson-load.py hosted with ❤ by GitHub

RedisInsight



Redis Semantic Search (KNN)


text_4 = """Radcliffe yet to answer GB call
Paula Radcliffe has been granted extra time to decide whether to compete in the World Cross-Country Championships.
The 31-year-old is concerned the event, which starts on 19 March in France, could upset her preparations for the London Marathon on 17 April. "There is no question that Paula would be a huge asset to the GB team," said Zara Hyde Peters of UK Athletics. "But she is working out whether she can accommodate the worlds without too much compromise in her marathon training." Radcliffe must make a decision by Tuesday - the deadline for team nominations. British team member Hayley Yelling said the team would understand if Radcliffe opted out of the event. "It would be fantastic to have Paula in the team," said the European cross-country champion. "But you have to remember that athletics is basically an individual sport and anything achieved for the team is a bonus. "She is not messing us around. We all understand the problem." Radcliffe was world cross-country champion in 2001 and 2002 but missed last year's event because of injury. In her absence, the GB team won bronze in Brussels.
"""
vec = np.array(get_vector(text_4), dtype=np.float32).tobytes()
q = Query('*=>[KNN 3 @vector $query_vec AS vector_score]')\
.sort_by('vector_score')\
.return_fields('vector_score', 'content')\
.dialect(2)
params = {"query_vec": vec}
results = client.ft('idx').search(q, query_params=params)
for doc in results.docs:
print(f"distance:{round(float(doc['vector_score']),3)} content:{doc['content']}\n")
distance:0.188 content:Dibaba breaks 5,000m world record
Ethiopia's Tirunesh Dibaba set a new world record in winning the women's 5,000m at the Boston Indoor Games.
Dibaba won in 14 minutes 32.93 seconds to erase the previous world indoor mark of 14:39.29 set by another Ethiopian, Berhane Adera, in Stuttgart last year. But compatriot Kenenisa Bekele's record hopes were dashed when he miscounted his laps in the men's 3,000m and staged his sprint finish a lap too soon. Ireland's Alistair Cragg won in 7:39.89 as Bekele battled to second in 7:41.42. "I didn't want to sit back and get out-kicked," said Cragg. "So I kept on the pace. The plan was to go with 500m to go no matter what, but when Bekele made the mistake that was it. The race was mine." Sweden's Carolina Kluft, the Olympic heptathlon champion, and Slovenia's Jolanda Ceplak had winning performances, too. Kluft took the long jump at 6.63m, while Ceplak easily won the women's 800m in 2:01.52.
distance:0.268 content:Japan narrowly escapes recession
Japan's economy teetered on the brink of a technical recession in the three months to September, figures show.
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quarter. On an annual basis, the data suggests annual growth of just 0.2%, suggesting a much more hesitant recovery than had previously been thought. A common technical definition of a recession is two successive quarters of negative growth.
The government was keen to play down the worrying implications of the data. "I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully," said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. "It's painting a picture of a recovery... much patchier than previously thought," said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.
distance:0.287 content:Google's toolbar sparks concern
Search engine firm Google has released a trial tool which is concerning some net users because it directs people to pre-selected commercial websites.
The AutoLink feature comes with Google's latest toolbar and provides links in a webpage to Amazon.com if it finds a book's ISBN number on the site. It also links to Google's map service, if there is an address, or to car firm Carfax, if there is a licence plate. Google said the feature, available only in the US, "adds useful links". But some users are concerned that Google's dominant position in the search engine market place could mean it would be giving a competitive edge to firms like Amazon.
AutoLink works by creating a link to a website based on information contained in a webpage - even if there is no link specified and whether or not the publisher of the page has given permission.
If a user clicks the AutoLink feature in the Google toolbar then a webpage with a book's unique ISBN number would link directly to Amazon's website. It could mean online libraries that list ISBN book numbers find they are directing users to Amazon.com whether they like it or not. Websites which have paid for advertising on their pages may also be directing people to rival services. Dan Gillmor, founder of Grassroots Media, which supports citizen-based media, said the tool was a "bad idea, and an unfortunate move by a company that is looking to continue its hypergrowth". In a statement Google said the feature was still only in beta, ie trial, stage and that the company welcomed feedback from users. It said: "The user can choose never to click on the AutoLink button, and web pages she views will never be modified. "In addition, the user can choose to disable the AutoLink feature entirely at any time."
The new tool has been compared to the Smart Tags feature from Microsoft by some users. It was widely criticised by net users and later dropped by Microsoft after concerns over trademark use were raised. Smart Tags allowed Microsoft to link any word on a web page to another site chosen by the company. Google said none of the companies which received AutoLinks had paid for the service. Some users said AutoLink would only be fair if websites had to sign up to allow the feature to work on their pages or if they received revenue for any "click through" to a commercial site. Cory Doctorow, European outreach coordinator for digital civil liberties group Electronic Fronter Foundation, said that Google should not be penalised for its market dominance. "Of course Google should be allowed to direct people to whatever proxies it chooses. "But as an end user I would want to know - 'Can I choose to use this service?, 'How much is Google being paid?', 'Can I substitute my own companies for the ones chosen by Google?'." Mr Doctorow said the only objection would be if users were forced into using AutoLink or "tricked into using the service".
view raw results.txt hosted with ❤ by GitHub

Redis Hybrid Search (Full-text + KNN)


text_5 = """Ethiopia's crop production up 24%
Ethiopia produced 14.27 million tonnes of crops in 2004, 24% higher than in 2003 and 21% more than the average of the past five years, a report says.
In 2003, crop production totalled 11.49 million tonnes, the joint report from the Food and Agriculture Organisation and the World Food Programme said. Good rains, increased use of fertilizers and improved seeds contributed to the rise in production. Nevertheless, 2.2 million Ethiopians will still need emergency assistance.
The report calculated emergency food requirements for 2005 to be 387,500 tonnes. On top of that, 89,000 tonnes of fortified blended food and vegetable oil for "targeted supplementary food distributions for a survival programme for children under five and pregnant and lactating women" will be needed.
In eastern and southern Ethiopia, a prolonged drought has killed crops and drained wells. Last year, a total of 965,000 tonnes of food assistance was needed to help seven million Ethiopians. The Food and Agriculture Organisation (FAO) recommend that the food assistance is bought locally. "Local purchase of cereals for food assistance programmes is recommended as far as possible, so as to assist domestic markets and farmers," said Henri Josserand, chief of FAO's Global Information and Early Warning System. Agriculture is the main economic activity in Ethiopia, representing 45% of gross domestic product. About 80% of Ethiopians depend directly or indirectly on agriculture.
"""
vec = np.array(get_vector(text_5), dtype=np.float32).tobytes()
q = Query('@content:recession => [KNN 3 @vector $query_vec AS vector_score]')\
.sort_by('vector_score')\
.return_fields('vector_score', 'content')\
.dialect(2)
params = {"query_vec": vec}
results = client.ft('idx').search(q, query_params=params)
for doc in results.docs:
print(f"distance:{round(float(doc['vector_score']),3)} content:{doc['content']}\n")
distance:0.241 content:Japan narrowly escapes recession
Japan's economy teetered on the brink of a technical recession in the three months to September, figures show.
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quarter. On an annual basis, the data suggests annual growth of just 0.2%, suggesting a much more hesitant recovery than had previously been thought. A common technical definition of a recession is two successive quarters of negative growth.
The government was keen to play down the worrying implications of the data. "I maintain the view that Japan's economy remains in a minor adjustment phase in an upward climb, and we will monitor developments carefully," said economy minister Heizo Takenaka. But in the face of the strengthening yen making exports less competitive and indications of weakening economic conditions ahead, observers were less sanguine. "It's painting a picture of a recovery... much patchier than previously thought," said Paul Sheard, economist at Lehman Brothers in Tokyo. Improvements in the job market apparently have yet to feed through to domestic demand, with private consumption up just 0.2% in the third quarter.
view raw results.txt hosted with ❤ by GitHub

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.

Saturday, May 27, 2023

Redis Polygon Search

Summary

This post will demonstrate the usage of a new search feature within Redis - geospatial search with polygons.  This search feature is part of the 7.2.0-M01 Redis Stack release.  This initial release supports the WITHIN and CONTAINS query types for polygons, only.  Additional geospatial search types will be forthcoming in future releases.  

Architecture


Code Snippets

Point Generation

I use the Shapely module to generate the geometries for this demo.  The code snippet below will generate a random point, optionally within a bounding box.

def _get_point(self, box: Polygon = None) -> Point:
""" Private function to generate a random point, potentially within a bounding box
Parameters
----------
box - Optional bounding box
Returns
-------
Shapely Point object
"""
point: Point
if box:
minx, miny, maxx, maxy = box.bounds
while True:
point = Point(random.uniform(minx, maxx), random.uniform(miny, maxy))
if box.contains(point):
break
else:
point = Point(random.uniform(MIN_X, MAX_X), random.uniform(MIN_Y, MAX_Y))
return point

Polygon Generation

Random polygons can be generated using the random point function above.  By passing a polygon as an input parameter, the generated polygon can be placed inside that input polygon.

def _get_polygon(self, box: Polygon = None) -> Polygon:
""" Private function to generate a random polygon, potentially within a bounding box
Parameters
----------
box - Optional bounding box
Returns
-------
Shapely Polygon object
"""
points: List[Point] = []
for _ in range(random.randint(3,10)):
points.append(self._get_point(box))
ob: MultiPoint = MultiPoint(points)
return Polygon(ob.convex_hull)

Redis Polygon Search Index

The command below creates an index on the polygons with the new keyword 'GEOMETRY' for their associated WKT-formatted points.  Note this code is sending a raw CLI command to Redis.  The redis-py lib does not support the new geospatial command sets at the time of this writing.

self.client.execute_command('FT.CREATE', 'idx', 'ON', 'JSON', 'PREFIX', '1', 'key:',
'SCHEMA', '$.name', 'AS', 'name', 'TEXT', '$.geom', 'AS', 'geom', 'GEOSHAPE', 'FLAT')
view raw poly-index.py hosted with ❤ by GitHub

Redis Polygon Load as JSON

The code below inserts 4 polygons into Redis as JSON objects.  Those objects are indexed within Redis by the code above.
  
self.client.json().set('key:1', '$', { "name": "Red Polygon", "geom": poly_red.wkt })
self.client.json().set('key:2', '$', { "name": "Green Polygon", "geom": poly_green.wkt })
self.client.json().set('key:3', '$', { "name": "Blue Polygon", "geom": poly_blue.wkt })
self.client.json().set('key:4', '$', { "name": "Cyan Polygon", "geom": poly_cyan.wkt })
self.client.json().set('key:5', '$', { "name": "Purple Point", "geom": point_purple.wkt })
self.client.json().set('key:6', '$', { "name": "Brown Point", "geom": point_brown.wkt })
self.client.json().set('key:7', '$', { "name": "Orange Point", "geom": point_orange.wkt })
self.client.json().set('key:8', '$', { "name": "Olive Point", "geom": point_olive.wkt })
view raw poly-json.py hosted with ❤ by GitHub

Redis Polygon Search

Redis Polygon search (contains or within) code below. Again, this is the raw CLI command.
def _poly_search(self, qt: QUERY, color: COLOR, shape: Polygon, filter: SHAPE) -> None:
""" Private function for POLYGON search in Redis.
Parameters
----------
qt - Redis Geometry search type (contains or within)
color - color attribute of polygon
shape - Shapely point or polygon object
filter - query filter on shape types (polygon or point) to be returned
Returns
-------
None
"""
results: list = self.client.execute_command('FT.SEARCH', 'idx', f'(-@name:{color.value} @name:{filter.value} @geom:[{qt.value} $qshape])', 'PARAMS', '2', 'qshape', shape.wkt, 'RETURN', '1', 'name', 'DIALECT', '3')
if (results[0] > 0):
for res in results:
if isinstance(res, list):
print(res[1].decode('utf-8').strip('[]"'))
else:
print('None')

Results

Plot






Results


*** Polygons within the Red Polygon ***
Green Polygon
Blue Polygon
Cyan Polygon
*** Polygons within the Green Polygon ***
Blue Polygon
Cyan Polygon
*** Polygons within the Blue Polygon ***
Cyan Polygon
*** Polygons within the Cyan Polygon ***
None
*** Points within the Red Polygon ***
Purple Point
Brown Point
Orange Point
Olive Point
*** Points within the Green Polygon ***
Purple Point
Brown Point
Orange Point
Olive Point
*** Points within the Blue Polygon ***
Purple Point
Brown Point
*** Points within the Cyan Polygon ***
Purple Point
Brown Point
*** Polygons containing the Red Polygon ***
None
*** Polygons containing the Green Polygon ***
Red Polygon
*** Polygons containing the Blue Polygon ***
Red Polygon
Green Polygon
*** Polygons containing the Cyan Polygon ***
Red Polygon
Green Polygon
Blue Polygon
*** Polygons containing the Purple Point ***
Red Polygon
Green Polygon
Blue Polygon
Cyan Polygon
*** Polygons containing the Brown Point ***
Red Polygon
Green Polygon
Blue Polygon
Cyan Polygon
*** Polygons containing the Orange Point ***
Red Polygon
Green Polygon
*** Polygons containing the Olive Point ***
Red Polygon
Green Polygon
view raw results.txt hosted with ❤ by GitHub

Source


Copyright ©1993-2024 Joey E Whelan, All rights reserved.