{"id":102,"date":"2026-03-31T04:11:12","date_gmt":"2026-03-31T04:11:12","guid":{"rendered":"https:\/\/byte64.com\/?p=102"},"modified":"2026-03-31T14:44:36","modified_gmt":"2026-03-31T14:44:36","slug":"vector-retrieval","status":"publish","type":"post","link":"https:\/\/byte64.com\/?p=102","title":{"rendered":"Vector Retrieval"},"content":{"rendered":"\n<p>Now that every document has been assigned a vector encoding its semantics, this opens the door to a new kind of retrieval. Rather than find documents that might be relevant to the query by searching through the search index for keywords, we can instead take the query&#8217;s vector and find nearby documents in the embedding. <\/p>\n\n\n\n<p>One advantage this has over keyword retrieval is that we know that all the documents we retrieve this way are on the right topic. So rather than going through tens of thousands of documents and scoring them, they come pre-scored by the embedding and so we need a lot fewer of them. Right now, the byte64 search engine is setup to pull only the 50 nearest neighbour documents to the query&#8217;s vector. And the results are great.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"697\" src=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m-1024x697.png\" alt=\"\" class=\"wp-image-103\" srcset=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m-1024x697.png 1024w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m-300x204.png 300w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m-768x523.png 768w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m-1536x1045.png 1536w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-9.20.37-p.m.png 1640w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Let&#8217;s talk about the infrastructure that supports this kind of retrieval. The fundamental problem of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Nearest_neighbor_search\">exact vector-based nearest neighbour retrieval<\/a> is roughly linear in the worst case. Approximate methods can speed this up considerably. Off the shelf, you can get Google&#8217;s <a href=\"https:\/\/github.com\/google-research\/google-research\/tree\/master\/scann\">Scaleable Nearest Neighbor library<\/a> which provides everything you need &#8212; building an index and serving that&#8217;s optimized for x86 architecture. It can pull the nearest 50 documents in under 30ms. But for the wikipedia dataset, it costs another 4 GiB of ram.<\/p>\n\n\n\n<p>I tried rolling my own approximate nearest neighbour algorithm. My idea was do divide the vectors in <math data-latex=\"\\mathbb{S}^{127} = \\{ v \\in \\mathbb{R}^{128}  \\mid \\|v\\| = 1\\}\"><semantics><mrow><msup><mi>\ud835\udd4a<\/mi><mn>127<\/mn><\/msup><mo>=<\/mo><mo form=\"prefix\" stretchy=\"false\">{<\/mo><mi>v<\/mi><mo>\u2208<\/mo><msup><mi>\u211d<\/mi><mn>128<\/mn><\/msup><mo lspace=\"0.22em\" rspace=\"0.22em\" stretchy=\"false\">|<\/mo><mi>\u2016<\/mi><mi>v<\/mi><mi>\u2016<\/mi><mo>=<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">}<\/mo><\/mrow><annotation encoding=\"application\/x-tex\">\\mathbb{S}^{127} = \\{ v \\in \\mathbb{R}^{128}  \\mid \\|v\\| = 1\\}<\/annotation><\/semantics><\/math> into different cells <math data-latex=\"C_{ij}\"><semantics><msub><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><\/msub><annotation encoding=\"application\/x-tex\">C_{ij}<\/annotation><\/semantics><\/math>. Then given a query vector <math data-latex=\"q \\in \\mathbb{S}^{127}\"><semantics><mrow><mi>q<\/mi><mo>\u2208<\/mo><msup><mi>\ud835\udd4a<\/mi><mn>127<\/mn><\/msup><\/mrow><annotation encoding=\"application\/x-tex\">q \\in \\mathbb{S}^{127}<\/annotation><\/semantics><\/math> we&#8217;ll select some cells that we think the candidates lie in. If the vectors within the cells are nicely ordered, we can even merge them using using a heap until we get the number of documents we want.<\/p>\n\n\n\n<p>Given a vector <math data-latex=\"v = (v_1, \\ldots, v_{128})\"><semantics><mrow><mi>v<\/mi><mo>=<\/mo><mo form=\"prefix\" stretchy=\"false\">(<\/mo><msub><mi>v<\/mi><mn>1<\/mn><\/msub><mo separator=\"true\">,<\/mo><mo>\u2026<\/mo><mo separator=\"true\">,<\/mo><msub><mi>v<\/mi><mn>128<\/mn><\/msub><mo form=\"postfix\" stretchy=\"false\">)<\/mo><\/mrow><annotation encoding=\"application\/x-tex\">v = (v_1, \\ldots, v_{128})<\/annotation><\/semantics><\/math> let <math data-latex=\"v_i\"><semantics><msub><mi>v<\/mi><mi>i<\/mi><\/msub><annotation encoding=\"application\/x-tex\">v_i<\/annotation><\/semantics><\/math> and <math data-latex=\"v_j\"><semantics><msub><mi>v<\/mi><mi>j<\/mi><\/msub><annotation encoding=\"application\/x-tex\">v_j<\/annotation><\/semantics><\/math> be the largest and second largest component of v (taken in absolute values). We define <math data-latex=\"C_{ij}\"><semantics><msub><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><\/msub><annotation encoding=\"application\/x-tex\">C_{ij}<\/annotation><\/semantics><\/math> to be the collection of all such vectors &#8212; actually we need to account for the signs of <math data-latex=\"v_i\"><semantics><msub><mi>v<\/mi><mi>i<\/mi><\/msub><annotation encoding=\"application\/x-tex\">v_i<\/annotation><\/semantics><\/math> and <math data-latex=\"v_j\"><semantics><msub><mi>v<\/mi><mi>j<\/mi><\/msub><annotation encoding=\"application\/x-tex\">v_j<\/annotation><\/semantics><\/math> so we really need <math data-latex=\"C_{ij}^{++}, C_{ij}^{-+}, C_{ij}^{+-}, C_{ij}^{--}\"><semantics><mrow><msubsup><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><mrow><mo lspace=\"0em\" rspace=\"0em\">+<\/mo><mo form=\"prefix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">+<\/mo><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><msubsup><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><mrow><mo lspace=\"0em\" rspace=\"0em\">\u2212<\/mo><mo form=\"prefix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">+<\/mo><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><msubsup><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><mrow><mo lspace=\"0em\" rspace=\"0em\">+<\/mo><mo form=\"prefix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">\u2212<\/mo><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><msubsup><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><mrow><mo lspace=\"0em\" rspace=\"0em\">\u2212<\/mo><mo form=\"prefix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">\u2212<\/mo><\/mrow><\/msubsup><\/mrow><annotation encoding=\"application\/x-tex\">C_{ij}^{++}, C_{ij}^{-+}, C_{ij}^{+-}, C_{ij}^{&#8211;}<\/annotation><\/semantics><\/math>. So, this defines <math data-latex=\"4 \\cdot 128 \\cdot 127\"><semantics><mrow><mn>4<\/mn><mo>\u22c5<\/mo><mn>128<\/mn><mo>\u22c5<\/mo><mn>127<\/mn><\/mrow><annotation encoding=\"application\/x-tex\">4 \\cdot 128 \\cdot 127<\/annotation><\/semantics><\/math> different cells. <\/p>\n\n\n\n<p>When we&#8217;re trying to find vectors near <math data-latex=\"q = (q_1, \\ldots, q_{128})\"><semantics><mrow><mi>q<\/mi><mo>=<\/mo><mo form=\"prefix\" stretchy=\"false\">(<\/mo><msub><mi>q<\/mi><mn>1<\/mn><\/msub><mo separator=\"true\">,<\/mo><mo>\u2026<\/mo><mo separator=\"true\">,<\/mo><msub><mi>q<\/mi><mn>128<\/mn><\/msub><mo form=\"postfix\" stretchy=\"false\">)<\/mo><\/mrow><annotation encoding=\"application\/x-tex\">q = (q_1, \\ldots, q_{128})<\/annotation><\/semantics><\/math>, we sort the components in absolute values as <math data-latex=\"|q_{\\sigma(1)}| \\geq |q_{\\sigma(2)}| \\geq \\cdots \\geq |q_{\\sigma(128)}| \"><semantics><mrow><mi>|<\/mi><msub><mi>q<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><\/msub><mi>|<\/mi><mo>\u2265<\/mo><mi>|<\/mi><msub><mi>q<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>2<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><\/msub><mi>|<\/mi><mo>\u2265<\/mo><mo>\u22ef<\/mo><mo>\u2265<\/mo><mi>|<\/mi><msub><mi>q<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>128<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><\/msub><mi>|<\/mi><\/mrow><annotation encoding=\"application\/x-tex\">|q_{\\sigma(1)}| \\geq |q_{\\sigma(2)}| \\geq \\cdots \\geq |q_{\\sigma(128)}| <\/annotation><\/semantics><\/math> and then search in <math data-latex=\"C_{\\sigma(1)\\sigma(2)}^{s_1 s_2}, C_{\\sigma(1)\\sigma(3)}^{s_1 s_3}, C_{\\sigma(1)\\sigma(4)}^{s_1 s_4}, \\ldots\"><semantics><mrow><msubsup><mi>C<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>2<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><mrow><msub><mi>s<\/mi><mn>1<\/mn><\/msub><msub><mi>s<\/mi><mn>2<\/mn><\/msub><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><msubsup><mi>C<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>3<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><mrow><msub><mi>s<\/mi><mn>1<\/mn><\/msub><msub><mi>s<\/mi><mn>3<\/mn><\/msub><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><msubsup><mi>C<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>4<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><mrow><msub><mi>s<\/mi><mn>1<\/mn><\/msub><msub><mi>s<\/mi><mn>4<\/mn><\/msub><\/mrow><\/msubsup><mo separator=\"true\">,<\/mo><mo>\u2026<\/mo><\/mrow><annotation encoding=\"application\/x-tex\">C_{\\sigma(1)\\sigma(2)}^{s_1 s_2}, C_{\\sigma(1)\\sigma(3)}^{s_1 s_3}, C_{\\sigma(1)\\sigma(4)}^{s_1 s_4}, \\ldots<\/annotation><\/semantics><\/math>in that order. Here <math data-latex=\"s_i\"><semantics><msub><mi>s<\/mi><mi>i<\/mi><\/msub><annotation encoding=\"application\/x-tex\">s_i<\/annotation><\/semantics><\/math> is the sign of <math data-latex=\"q_{\\sigma(i)}\"><semantics><msub><mi>q<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mi>i<\/mi><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><\/msub><annotation encoding=\"application\/x-tex\">q_{\\sigma(i)}<\/annotation><\/semantics><\/math>. <\/p>\n\n\n\n<p>I sorted the elements in <math data-latex=\"C_{ij}\"><semantics><msub><mi>C<\/mi><mrow><mi>i<\/mi><mi>j<\/mi><\/mrow><\/msub><annotation encoding=\"application\/x-tex\">C_{ij}<\/annotation><\/semantics><\/math> by their i-th component followed by j-th component, I was hoping next nearest neighbour would appear in <math data-latex=\"C_{\\sigma(1)\\sigma(k)}^{s_1 s_k}\"><semantics><msubsup><mi>C<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mi>k<\/mi><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><mrow><msub><mi>s<\/mi><mn>1<\/mn><\/msub><msub><mi>s<\/mi><mi>k<\/mi><\/msub><\/mrow><\/msubsup><annotation encoding=\"application\/x-tex\">C_{\\sigma(1)\\sigma(k)}^{s_1 s_k}<\/annotation><\/semantics><\/math> before <math data-latex=\"C_{\\sigma(1)\\sigma(k+1)}^{s_1 s_{k+1}}\"><semantics><msubsup><mi>C<\/mi><mrow><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mi>\u03c3<\/mi><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mi>k<\/mi><mo>+<\/mo><mn>1<\/mn><mo form=\"postfix\" stretchy=\"false\" lspace=\"0em\" rspace=\"0em\">)<\/mo><\/mrow><mrow><msub><mi>s<\/mi><mn>1<\/mn><\/msub><msub><mi>s<\/mi><mrow><mi>k<\/mi><mo>+<\/mo><mn>1<\/mn><\/mrow><\/msub><\/mrow><\/msubsup><annotation encoding=\"application\/x-tex\">C_{\\sigma(1)\\sigma(k+1)}^{s_1 s_{k+1}}<\/annotation><\/semantics><\/math>. But unfortunately, the largest component isn&#8217;t the best indication of where the largest dot-product will be. <\/p>\n\n\n\n<p>Take a look at the distribution of cosine similarities with an arbitrary nut-based wikipedia page.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Cosine Similarity Distribution (against page: https:\/\/en.wikipedia.org\/wiki\/Pistachio):\nBucket Range         | Count     \n---------------------|-----------\n&#91;-1.00, -0.90) | 0\n&#91;-0.90, -0.80) | 0\n&#91;-0.80, -0.70) | 0\n&#91;-0.70, -0.60) | 0\n&#91;-0.60, -0.50) | 0\n&#91;-0.50, -0.40) | 0\n&#91;-0.40, -0.30) | 22\n&#91;-0.30, -0.20) | 2263\n&#91;-0.20, -0.10) | 49199\n&#91;-0.10,  0.00) | 343893\n&#91; 0.00,  0.10) | 1018456\n&#91; 0.10,  0.20) | 1997144\n&#91; 0.20,  0.30) | 2221347\n&#91; 0.30,  0.40) | 955750\n&#91; 0.40,  0.50) | 318725\n&#91; 0.50,  0.60) | 204906\n&#91; 0.60,  0.70) | 18468\n&#91; 0.70,  0.80) | 96\n&#91; 0.80,  0.90) | 0\n&#91; 0.90,  1.00) | 1<\/code><\/pre>\n\n\n\n<p>But if we use the approach above to retrieve the 10000 nearest neighbors of the document, we get this distribution:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Similarity Distribution of top 10000 results:\nBucket Range         | Count     \n---------------------|-----------\n&#91;-1.00, -0.90) | 0\n&#91;-0.90, -0.80) | 0\n&#91;-0.80, -0.70) | 0\n&#91;-0.70, -0.60) | 0\n&#91;-0.60, -0.50) | 0\n&#91;-0.50, -0.40) | 0\n&#91;-0.40, -0.30) | 0\n&#91;-0.30, -0.20) | 0\n&#91;-0.20, -0.10) | 0\n&#91;-0.10,  0.00) | 0\n&#91; 0.00,  0.10) | 0\n&#91; 0.10,  0.20) | 0\n&#91; 0.20,  0.30) | 0\n&#91; 0.30,  0.40) | 0\n&#91; 0.40,  0.50) | 0\n&#91; 0.50,  0.60) | 5757\n&#91; 0.60,  0.70) | 4203\n&#91; 0.70,  0.80) | 39\n&#91; 0.80,  0.90) | 0\n&#91; 0.90,  1.00) | 1<\/code><\/pre>\n\n\n\n<p>So, we didn&#8217;t even retrieve half of the documents in the 0.7 to 0.8 range and those are all within the 100 nearest neighbours. This is pretty crumby recall, so I&#8217;m sticking with ScaNN for now.<\/p>\n\n\n\n<p>Two downsides to ScaNN is that it uses a python frontend to a C++ backend and it also requires x86. This complicates testing, since it requires docker on Mac M and it means python is needed on the search server. These kind of algorithms are incredibly fun to learn and write. Since it&#8217;s open source, I&#8217;d love to port it to rust if I can.<\/p>\n\n\n\n<p>~Andrew<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now that every document has been assigned a vector encoding its semantics, this opens the door to a new kind of retrieval. Rather than find documents that might be relevant to the query by searching through the search index for keywords, we can instead take the query&#8217;s vector and find nearby documents in the embedding. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,10,18],"tags":[25],"class_list":["post-102","post","type-post","status-publish","format-standard","hentry","category-data-structures","category-google-cloud","category-search","tag-vector-embeddings"],"_links":{"self":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/102","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=102"}],"version-history":[{"count":3,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/102\/revisions"}],"predecessor-version":[{"id":106,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/102\/revisions\/106"}],"wp:attachment":[{"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=102"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=102"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=102"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}