{"id":97,"date":"2026-03-26T20:44:15","date_gmt":"2026-03-26T20:44:15","guid":{"rendered":"https:\/\/byte64.com\/?p=97"},"modified":"2026-03-26T20:44:15","modified_gmt":"2026-03-26T20:44:15","slug":"vector-embeddings","status":"publish","type":"post","link":"https:\/\/byte64.com\/?p=97","title":{"rendered":"Vector Embeddings"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Search engines make use of AI to improve their search results. There are AI models that can understand the meaning of a sentence or document. They often present their results as embeddings of the document space into a vector space: <math data-latex=\"D \\to \\mathbb{R}^n\"><semantics><mrow><mi>D<\/mi><mo>\u2192<\/mo><msup><mi>\u211d<\/mi><mi>n<\/mi><\/msup><\/mrow><annotation encoding=\"application\/x-tex\">D \\to \\mathbb{R}^n<\/annotation><\/semantics><\/math>. These are called vector embeddings. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once you&#8217;ve found the embeddings for your documents, you wait for a query to come in and compute its embedding as well. By comparing the dot product of the normalized vectors &#8212; one for a document and one for a query &#8212; you can tell if the query and document are on the same subject. That provides a score you can use in your rankings.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I took an off-the-shell sentence transformer model that was as small and cheap to run as possible. The one I chose was <a href=\"https:\/\/huggingface.co\/sentence-transformers\/all-MiniLM-L6-v2\">all-MiniLM-L6-v2<\/a>. Not only did it work with my Mac&#8217;s graphics card, but it also allowed me to compute an embedding into a smaller 128 dimensional vector space. Many models compute 1024 dimensional embeddings.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It took me two days to run the model, on and off, for all 7 million Wikipedia documents. The resulting recordio file is 3.94 GiB which is up from 530 MiB with only keywords.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For now, I&#8217;m storing all the scoring information in memory and only keeping the search index out on disk using my SSTable. So, I started having trouble loading it into my Google Compute Engine e2-standard-2 server with only 8 GiB ram.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>byte64:~$ mem.sh \ncookie_server 79.19 MB\nsearch_server 6202.42 MB\nsstable_server 5.93 MB\ngateway 476.07 MB<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Two quick optimizations helped. <\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>I moved the ScoringInfo out of the protocol buffers and into its own struct. Since I know the exact size of the vectors, they can be stored as a fixed size array.<\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>struct CompactScoringInfo {\n    key: String,\n    keywords: Vec&lt;String>,\n    embedding: Option&lt;&#91;f32; 128]>,\n}<\/code><\/pre>\n\n\n\n<ol start=\"2\" class=\"wp-block-list\">\n<li> I noticed the gateway was often using a ton of memory when I would run the &#8220;roman empire&#8221; query which has 312,720 results. Just reading in the search response with that many entries and writing it back to json can take a ton of memory. I&#8217;ve now capped the results at 100 in the RPC request between the gateway and the search server. <\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">With those quick improvements the server is now health, though red-lining.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>byte64:~$ mem.sh \ncookie_server 79.37 MB\nsearch_server 6026.41 MB\nsstable_server 6.02 MB\ngateway 12.27 MB<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The CompactScoringInfo didn&#8217;t make as much of an improvement as the gateway fix. And, as always, there&#8217;s more work to do there.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s talk scoring! After I do the keyword search, I&#8217;m left with a ton of documents that need to be scored and ranked. Each of these documents has its own vector <math data-latex=\"v \\in \\mathbb{R}^{128}\"><semantics><mrow><mi>v<\/mi><mo>\u2208<\/mo><msup><mi>\u211d<\/mi><mn>128<\/mn><\/msup><\/mrow><annotation encoding=\"application\/x-tex\">v \\in \\mathbb{R}^{128}<\/annotation><\/semantics><\/math> and we can embed the query as <math data-latex=\"q \\in \\mathbb{R}^{128}\"><semantics><mrow><mi>q<\/mi><mo>\u2208<\/mo><msup><mi>\u211d<\/mi><mn>128<\/mn><\/msup><\/mrow><annotation encoding=\"application\/x-tex\">q \\in \\mathbb{R}^{128}<\/annotation><\/semantics><\/math>. If these two vectors point in the same direction, then they&#8217;re on the same subject. We can turn this into a semantic scoring metric by computing the cosine of the angle between them:<\/p>\n\n\n\n<div class=\"wp-block-math\"><math display=\"block\"><semantics><mrow><mrow><mi>cos<\/mi><mo>\u2061<\/mo><\/mrow><mo form=\"prefix\" stretchy=\"false\">(<\/mo><msub><mi>\u03b8<\/mi><mrow><mi>v<\/mi><mo separator=\"true\">,<\/mo><mi>q<\/mi><\/mrow><\/msub><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mo>=<\/mo><mfrac><mrow><mi>v<\/mi><mo>\u22c5<\/mo><mi>q<\/mi><\/mrow><mrow><mi>\u2016<\/mi><mi>v<\/mi><mi>\u2016<\/mi><mi>\u2016<\/mi><mi>q<\/mi><mi>\u2016<\/mi><\/mrow><\/mfrac><\/mrow><annotation encoding=\"application\/x-tex\">\\cos(\\theta_{v,q}) = \\frac{v \\cdot q}{\\|v\\| \\| q\\| }<\/annotation><\/semantics><\/math><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Recall, <math data-latex=\"\\cos(0) = 1\"><semantics><mrow><mrow><mi>cos<\/mi><mo>\u2061<\/mo><\/mrow><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mn>0<\/mn><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mo>=<\/mo><mn>1<\/mn><\/mrow><annotation encoding=\"application\/x-tex\">\\cos(0) = 1<\/annotation><\/semantics><\/math> and decreasing until 180 degrees, so having a smaller angle between them gives a number closer to 1. We can pre-normalize all our vector and compute the dot-product with <math data-latex=\"q\/\\|q\\|\"><semantics><mrow><mi>q<\/mi><mi>\/<\/mi><mi>\u2016<\/mi><mi>q<\/mi><mi>\u2016<\/mi><\/mrow><annotation encoding=\"application\/x-tex\">q\/\\|q\\|<\/annotation><\/semantics><\/math> really quickly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I&#8217;ve made a 50-50 blend of this score with my quick keyword search to compute the final score. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"547\" src=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m-1024x547.png\" alt=\"\" class=\"wp-image-99\" srcset=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m-1024x547.png 1024w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m-300x160.png 300w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m-768x410.png 768w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m-1536x821.png 1536w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-26-at-2.37.53-p.m.png 1636w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The results of using this simple embedding aren&#8217;t great, but they are starting to match the semantics of the query. Try them out at <a href=\"https:\/\/byte64.com\/search\/\">https:\/\/byte64.com\/search\/<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">There&#8217;s so many things to try next with these embeddings. I&#8217;m interested in looking at vector retrieval algorithms which lookup documents near the query embedding, rather than only looking at documents that have the exact words of the query.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">~Andrew<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Search engines make use of AI to improve their search results. There are AI models that can understand the meaning of a sentence or document. They often present their results as embeddings of the document space into a vector space: D\u2192\u211dnD \\to \\mathbb{R}^n. These are called vector embeddings. Once you&#8217;ve found the embeddings for your [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,18],"tags":[26,25,7],"class_list":["post-97","post","type-post","status-publish","format-standard","hentry","category-data-structures","category-search","tag-memory","tag-vector-embeddings","tag-wikipedia"],"_links":{"self":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/97","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=97"}],"version-history":[{"count":1,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/97\/revisions"}],"predecessor-version":[{"id":100,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/97\/revisions\/100"}],"wp:attachment":[{"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=97"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=97"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=97"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}