Tag: vector embeddings
-
Visualizing Retrieval
Andrew
Yesterday I created a Byte64 linkedin page, so clearly had to create some visualizations to use as a background image. No AI slop would do. The first thing I tried was visualizing the PFORs the encode my search index. I don’t have a good sense of what the spacing is like between local document ids…
-
Hybrid Search
Andrew
I’ve already written a post on using a piecewise-linear scaling to bring BM25 and my semantic score (from cosine similarity with our embeddings) into the same numerical space. After performing this scaling, I found some important results weren’t scoring as well as they should. In particular, any search query with a common term (e.g. “the”…
-
Score Normalization
Andrew
Currently, I have two different scoring functions: BM25 and the semantic scoring function that comes from our sentence embedding. These scores take very different ranges, but need to be combined to make a final score. It’s not simply a matter of assigning different weights to these scores. We need to stretch them out to make…
-
Vector Retrieval
Andrew
Now that every document has been assigned a vector encoding its semantics, this opens the door to a new kind of retrieval. Rather than find documents that might be relevant to the query by searching through the search index for keywords, we can instead take the query’s vector and find nearby documents in the embedding.…
-
Vector Embeddings
Andrew
Search engines make use of AI to improve their search results. There are AI models that can understand the meaning of a sentence or document. They often present their results as embeddings of the document space into a vector space: D→ℝnD \to \mathbb{R}^n. These are called vector embeddings. Once you’ve found the embeddings for your…
