{"id":180,"date":"2026-05-07T17:38:32","date_gmt":"2026-05-07T17:38:32","guid":{"rendered":"https:\/\/byte64.com\/?p=180"},"modified":"2026-05-28T15:09:29","modified_gmt":"2026-05-28T15:09:29","slug":"visualizing-retrieval","status":"publish","type":"post","link":"https:\/\/byte64.com\/?p=180","title":{"rendered":"Visualizing Retrieval"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Yesterday I created a <a href=\"https:\/\/www.linkedin.com\/company\/byte64\" data-type=\"link\" data-id=\"https:\/\/www.linkedin.com\/company\/byte64\">Byte64 linkedin page<\/a>, so clearly had to create some visualizations to use as a background image. No AI slop would do. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first thing I tried was visualizing the PFORs the encode my search index. I don&#8217;t have a good sense of what the spacing is like between local document ids under a single keyword. I assumed they&#8217;re pretty evenly spread out.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#8217;s what that ends up looking like. Depending on the number of bits needed to encode the deltas, you&#8217;ll see more or fewer steps. The orange lines correspond to exceptions that are too big for the bits used and are inserted afterwards. They&#8217;re encoded as pairs of index (one byte) and varint value.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"858\" height=\"1024\" src=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-858x1024.png\" alt=\"\" class=\"wp-image-181\" srcset=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-858x1024.png 858w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-251x300.png 251w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-768x917.png 768w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-1287x1536.png 1287w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.17.51-a.m-1716x2048.png 1716w\" sizes=\"auto, (max-width: 858px) 100vw, 858px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">I tried varying the height based on the bits used and stacking them together, but the visual wasn&#8217;t great:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"187\" src=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m-1024x187.png\" alt=\"\" class=\"wp-image-182\" srcset=\"https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m-1024x187.png 1024w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m-300x55.png 300w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m-768x140.png 768w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m-1536x280.png 1536w, https:\/\/byte64.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-07-at-11.25.38-a.m.png 1710w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next, I turned to our vector embedding. I decided to use the &#8220;Albert Einstein&#8221; page as a seed and found all of it&#8217;s neighbors &#8212; pages that it links to. Next I found all connections between those neighbors by looking through links in those pages. This makes a graph of 405 pages with 4,203 directed edges. Finally, I took their embeddings, projected down to the first three coordinates, renormalized and plotted them on a sphere. Gemini was nice enough to use page rank for the. size and color of the vertices.<\/p>\n\n\n\n<iframe loading=\"lazy\" \n  src=\"https:\/\/byte64.com\/wikipedia_connection_sphere.html\" \n  width=\"100%\" \n  height=\"600px\" \n  style=\"border: none; border-radius: 8px; box-shadow: 0 4px 12px rgba(0,0,0,0.08); display: block;\"\n  scrolling=\"no\">\n<\/iframe>\n\n\n\n<p class=\"wp-block-paragraph\">Ideally, closely related concepts would be close together on the sphere. But, projecting loses a lot of information and I think special <s>anisotropic projections<\/s> need to be used to preserve as much information as possible. So, uh, it&#8217;s just a pretty picture. (Edit: I looked into anisotropic projections, and they specifically are for dot products and embeddings in all of <math data-latex=\"\\mathbb{R}^n\"><semantics><msup><mi>\u211d<\/mi><mi>n<\/mi><\/msup><annotation encoding=\"application\/x-tex\">\\mathbb{R}^n<\/annotation><\/semantics><\/math> rather than on the sphere. I think there might not be better projections down to lower dimensional spheres.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday I created a Byte64 linkedin page, so clearly had to create some visualizations to use as a background image. No AI slop would do. The first thing I tried was visualizing the PFORs the encode my search index. I don&#8217;t have a good sense of what the spacing is like between local document ids [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,16],"tags":[32,25,7],"class_list":["post-180","post","type-post","status-publish","format-standard","hentry","category-coding-with-ai","category-data-processing","tag-retrieval","tag-vector-embeddings","tag-wikipedia"],"_links":{"self":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=180"}],"version-history":[{"count":6,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/180\/revisions"}],"predecessor-version":[{"id":189,"href":"https:\/\/byte64.com\/index.php?rest_route=\/wp\/v2\/posts\/180\/revisions\/189"}],"wp:attachment":[{"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/byte64.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}