Category: Coding with AI

  • Visualizing Retrieval

    Yesterday I created a Byte64 linkedin page, so clearly had to create some visualizations to use as a background image. No AI slop would do. The first thing I tried was visualizing the PFORs the encode my search index. I don’t have a good sense of what the spacing is like between local document ids…

  • Generating User Data

    A search engine depends on a feedback loop of users making queries, following links, returning to the search page and rewriting their queries. All these feed into an understanding of whether they’re finding the results they’re looking for. Bootstrapping a system like this is difficult because you don’t start with any users and your search…

  • Writing SSTables with Beam

    Apache Beam is an open source system for processing large datasets. It has both a realtime and a batch processing mode. The batch processing mode is based on Google’s internal Flume framework which I had the pleasure of using for 7 years while processing Android telemetry. It’s also the perfect system for building a search…

  • Building an SSTable

    SSTables are a critical piece of technology that holds up the modern web. It’s the basis for most modern databases, search backends and many other technologies. What it provides is a reasonably fast way to perform lookups in large datasets. SSTables are sorted string tables meaning both our lookup keys and the resulting values are…