Tag: sstable
-
Golden SSTables
When you’re testing, it’s common to transform whatever structure you’re dealing with into text and compare that text to a fixed string of expected text. This is convenient because you get a nice visual representation of what you’re expecting the output to be. This approach can cause problems if the text representation doesn’t capture everything…
-
Search Indexes and Memory
I’ve been working on v0 of the search engine which requires building a search index in the form of “posting lists”. I’ve build a pipeline that reads wikipedia documents, outputs all the words it finds and emits key-value pairs of (word, url). Then we group by word so we can lookup all documents that word…
-
Writing SSTables with Beam
Apache Beam is an open source system for processing large datasets. It has both a realtime and a batch processing mode. The batch processing mode is based on Google’s internal Flume framework which I had the pleasure of using for 7 years while processing Android telemetry. It’s also the perfect system for building a search…
-
SSTable on SSD
When I first put the SSTable server on Google’s Compute Engine, I opted for the cheapest machine: an e2-small (2 cores and 2GB ram) with a standard persistent disk of 20 GiB. When I ran the server, each request was taking hundreds of milliseconds. Installed iotop and found I was getting at most 8 MiB/s…
-
Building an SSTable
SSTables are a critical piece of technology that holds up the modern web. It’s the basis for most modern databases, search backends and many other technologies. What it provides is a reasonably fast way to perform lookups in large datasets. SSTables are sorted string tables meaning both our lookup keys and the resulting values are…
