Linus’s Blog

Lilac and Codex

2023-11-21T00:00:00Z

2023-11-21
programming

TL;DR: Check out my new open source projects, Lilac (GitHub) and Codex (GitHub)!

Lilac

Lilac builds on top of Emacs Orgmode to make Literate Programming (LP) targeting HTML much more pleasant than the default experience. It’s unapologetically opinionated. It encourages heavy use of Noweb-style references.

Most people don’t believe in LP, which means it’ll get little use. But there is at least one user, officially…

Codex

Codex uses Lilac! Codex is a collection of blogpost-worthy solutions for some programming problems. I’m mainly doing this because it’s really fun for me when I dive deep into technical problems.

Using Lilac and LP is a natural choice for Codex, because many (if not all?) the topics are small enough to be read and understood in one sitting. Bite-sized knowledge, my favorite!

LP is amazing

I’m starting to think that any new programming project of mine should be done with LP. There’s already been numerous times when I had to dig deep into Lilac to edit things around after taking a long break, fearing that I may have trouble regaining context to become productive; and each time I was surprised to see how easy it was to context-switch.

Thank you, Donald Knuth, for introducing LP to the world!

Git Archaeology

2023-03-21T00:00:00Z

2023-03-21
programming, git

TL;DR: Check out my STUDY_NOTES.md on Git if you want a quick understanding of (ancient) Git internals!

I’ve been using Git since 2009. In all that time I never really bothered with understanding Git internals, because frankly after learning what a directed acyclic graph (DAG) was, everything just fell into place.

That’s going to change, because in the coming weeks, I will start contributing to Git on a somewhat regular basis (at least, that’s the plan). It won’t be the first time contributing to the project (which I did back in 2014), but I will need to begin studying how Git works under the hood.

To that end, I spent the better part of last weekend trying to understand Git’s internals. The current Git codebase is a bit daunting, and there’s no way that I’m going to read it all any time soon. But the very first commit of Git is small enough to read in one sitting, and so I tried compiling it (there were lots of errors), while taking notes in the source code directly. I also actually used the produced binaries to prove to myself that yes, this system actually does work even at this primitive stage.

Now, there are major differences between this ancient root-commit version of “Git” and modern Git. However, I’ve taken note of all such differences (at least as many as I could gather, within reason) by digging into the Git mailing list archive to try to make sense of why things were changed the way they were (e.g., How come we have so-called “pack” files? How come the SHA1 hash of an object (using sha1sum) is not the same as its directory name plus filename?) You can see my notes in the STUDY_NOTES.md file for the answers.

I have to admit that I found Linus Torvalds’ initial design decisions to be impressively elegant. Reading the first commit made me have multiple “ah-ha!” moments behind why Git has a distinction between the index and the working tree, why it doesn’t track empty directories, why Git doesn’t care if you blow away the working tree as long as the .git folder is intact, etc. And the code is pretty easy to follow! It’s a great resource for any aspiring C hacker.

Note: I’ve reproduced the study notes below for posterity. Do check out the branch directly, or apply the patches yourself on the root commit.

Root-commit Git study notes

This is a special branch of Git for learning purposes. It is special because it is based off the absolute minimal “ancient” implementation of Git (Linus Torvald’s root commit at e83c5163316f89bfbde7d9ab23ca2e25604af290), with some small changes to make it easy to compile with Nix (see the Makefile changes) and also “feel” more like modern Git (namely, the use of .git/ instead of .dircache/). Yes, you can technically grab the 100th or so commit which basically has all of the changes I made, but you’d be dealing with a lot more code to read. If you just want to quickly understand Git’s data structures, there’s honestly nothing faster than reading the root commit (it’s only ~1000 lines of C, including comments) and with some additional notes to fill in any missing gaps (which this document tries to do).

The biggest revelation I had while creating these notes is that Git’s data structures have proven to be incredibly stable — the initial idea of an object database (.git/objects/...) and the cache (.git/index) were there from day 1 and are still the main workhorses for Git. Knowing these two concepts will radically reduce the perceived complexity of modern Git’s numerous bells and whistles, as every other thing you see in the .git folder are mere extensions of these two essential data structures.

This version comes with a basic Usage Guide to help users actually use the binaries that shipped with the root commit. Run make (you need the Nix package manager) and try to use the commands in the order described in the Usage Guide below. After (or perhaps before?) you run each command, read its source code. You might want to have a look at cache.h first — but the main “meat” of it all is in update-cache.c — which makes sense, because the cache is always updated first before anything is written to the object database.

Pro tip: use 6ad6d3d36c5924c8ff502ebbb6a6216df01e7efb as a shortcut to view the first 100ish commits in Git’s history. This is handy to understand some of the early changes that went into Git. As a bonus, this commit also updates the README to capture the workflow of actually using GIT in its ancient form. Perhaps it is obvious, but use git log -p README to see its history.

Data structures

The two big data structures are:

The object database (all files under .git/objects/...), and
The index file (aka “cache”, at .git/index).

Refer to README for a more thorough discussion of these data structures. But here are a few more interesting notes about each data structure.

The Object Database

The 8-bit fanout

The object database has 256 folders, named 00 through ff in hex notation (the first 8 bits of the 20-byte SHA1 hashing scheme used to generate the object IDs for this database). You may wonder why do we bother with this structure (after all, all files are already named with their unique SHA1 hash so the chance of an innocent collision is virtually zero). Torvalds stated in April 2005 that he didn’t want hundreds of thousands of files in one subdirectory:

The 8-bit initial fan-out is very much a
middle ground: we waste some space (and some time) doing it, but it does
make the really horrible case largely go away.

The “horrible case” probably refers to the possibility of hundreds of thousands of files all residing in a single directory, which Torvalds brought up in the linked email.

Object IDs

You can run sha1sum of any file in the object database and the output (SHA1 hash) will match the filename path. E.g.,

$ sha1sum .git/objects/cc/41b0dfbe81a71ca922cda3c9de9db3a25a56b4
cc41b0dfbe81a71ca922cda3c9de9db3a25a56b4  .git/objects/cc/41b0dfbe81a71ca922cda3c9de9db3a25a56b4

and notice that cc41b0dfbe81a71ca922cda3c9de9db3a25a56b4 matches the cc/41b0dfbe81a71ca922cda3c9de9db3a25a56b4. However, this is no longer the case and you’ll get a different SHA1 hash using any recent Git version. The reason is because Git originally hashed the compressed (post-zlibbed) contents, but now it hashes the decompressed (pre-zlibbed) content. This switch-over was done in d98b46f8d9a3daf965a39f8c0089c1401e0081ee and f18ca7316631914776136455c151d70318299459, just a couple weeks after the root commit, mainly for performance reasons (because write-tree was taking too long in applying patches). See the original discussion and the “Object DB conversion” announcement.

Also see this page for a guide on using Python to check the hashes of objects (in case you want to check the hash output independently of Git tooling).

Only basic compression

Modern Git uses at least two additional schemes not present in this initial version to help reduce redundant data: pack files (record deltas of similar objects), and recursive tree objects (that’s right, in the original implementation, a tree object could only refer to blobs).

Pack files

Note that in this version, Git treats a file’s content as an atomic unit of data — it doesn’t perform any form of “chunking” to divide it up into smaller bits (similar to what bittorrent does). So every file will get its own blob, and the only way that a blob will be reused (thus saving disk space) in a subsequent commit is if does not change. It must match identically!

You may think, “why not just divide a file into chunks, and make blobs out of each chunk?” — that way, you’d naturally get some level of deduping, even without any additional work. Torvalds considered this but rejected the idea for two reasons: performance and simplicity.

Just a couple months after the above email though, Git learned about pack files. Basically, pack files compress a range of reachable objects between two commits and puts them all into two files, a pack index (.idx) and pack (.pack) file. The basic idea is that you can put all of these objects together in the .pack file, allowing you to do some level of compression inside it (assuming you have lots of objects that have similar content). Here is a description of how it would work in Torvald’s own words. Here’s a somewhat retrospective announcement, which explains that the previous “delta object” approach (where Git stored delta objects in the object database) is deprecated (however, do note that the algorithms to find the deltas (diff_delta()) was re-used in the pack files, so not everything was discarded).

If you’re wondering why the pack files have a separate dedicated index file, it basically comes down to performance and simplicity, again.
Recursive tree objects

As trees can currently only refer to blobs only, this means that every commit is somewhat wasteful (although this has the unique property that a single commit refers to a single tree object that has everything in it).

Recursive tree objects were added in d6d3f9d0125a7215f3cdc2600b2307ca55b69536.

The Cache

The cache, or index file, represents a tree “snapshot”. It is what is staged, ready to be committed. More precisely, it is just a cache_header followed by a list of cache_entry values, where each cache_entry is a blob object’s metadata. Among other things, the cache_header records how many cache entries there are in the index file. This is still true in modern Git as of March 2023 — if you run hexdump -C .git/index | head -n1 you can see, for example:

$ hexdump -C .git/index | head -n1
00000000  44 49 52 43 00 00 00 01  00 00 00 0d 64 15 77 69  |DIRC........d.wi|

where the DIRC is a magic number (standing for dircache, the original name of the .git folder) followed by 4 bytes (unsigned int) for the index version and another 4 bytes showing the number of cache entries, or file paths, that are being “tracked” for purposes of tree object creation. In the example above the index version is 1 (modern Git uses version 2), and there are 0x0d or 13 cache entries, or files, that would make up the current tree.

Note that if you run the above on an index file created by the original update-cache, you would see instead something like:

$ hexdump -C .git/index | head -n1
00000000  43 52 49 44 01 00 00 00  01 00 00 00 2b 1a 2d 28  |CRID........+.-(|

because the byte order was using little-endian, “host byte order”. This is what is meant by “native CPU byte format” comment in cache.h (because most CPUs are Intel, and Intel uses little-endian). The byte order was changed in ccc4feb579265266d0a4a73c0c9443ecc0c26ce3 to use big-endian, also called “network byte order”, for convenience over NFS.

Other missing things vs modern Git

This initial version of Git does not have support for HEAD (.git/HEAD) or branches (.git/refs/heads/...). In fact there are no human-friendly references at all! But one can easily understand that references are just pointers to the object store — all you would need is a way to keep track of the latest commit by saving its object ID (SHA1 hash) somewhere. The simplest possible thing you could do is to have a file with this object ID in it — and this is what modern Git (still) does. The old README notes that in practice, the SHA1 hash was written at .git/HEAD. It was formally recognized as such just a day later in 839a7a06f35bf8cd563a41d6db97f453ab108129, as part of the git-prune-script and git-pull-script helpers to help with merging.

Usage Guide

This guide explains how the earliest version of Git (root commit) works. You can read these steps and also look up the C source code and read them to get a better sense of how everything works.

Initialize the object database with init-db. This is the .git directory.
Make changes to files. These files can be any file except the .git directory. We don’t have the concept of .gitignore yet, and also, all dotfiles (any file that begins with a .) are ignored and cannot be tracked by Git.
Stage modified files with update-cache [...FILES]. This compresses these files’ contents and saves them to the object database, such that each file gets its own object database file. At this point the files are tracked by Git. It also results in adding this file’s metadata (essentially the filename and SHA1 of its contents) to the .git/index file.
(Optional) Check the diff of what is in the .git/index (staged) versus the current working tree with show-diff. We are just diffing whatever is in the current cache .git/index (essentially the last known “tree-to-be-written-to-object-database-but-not-yet”) and what is on disk at those paths that the cache describes. The diffing comparison is basic and is based on timestamps and inodes (presumably for performance).

This diff is the ancient equivalent of git diff. If we add those files that have been modified with update-cache, then show-diff will show nothing, because the working tree files on disk match what is in the index file (just like how modern git diff will show nothing, unless you invoke git diff --cached, in this situation).

Note also that we are not comparing things to a previous commit of any kind. Instead we are always only diffing the files that were touched/modified (during the course of normal development) and what the index file has. It’s even more primitive than the modern “detached HEAD mode” in Git because we do not automatically diff against a “current commit” because the concept of a “current commit” doesn’t exist yet — we literally have blobs, trees, and commits in the object database, the index file (describing whatever paths make up another (perhaps new and unique) tree object), and the working tree (everything except the .git folder).

Lastly, the show-diff command shells out to diff (so the codebase doesn’t have any fancy diffing algorithms).
Run write-tree to save the data in .git/index is its own tree object in the object database. The SHA1 of this tree object is printed to STDOUT. Take a note of this SHA1 hash, as it will be referenced to construct a commit (changeset) object.
(Optional) Check the SHA1 from write-tree with read-tree . This will display the tree object (by displaying its blobs).
Create a new commit with echo "my-commit-message" | commit-tree , using the SHA1 from step 5 above. This will create a new commit object and write it to the object database.
(Optional) Check the commit with cat-file . This will write the commit message and metadata (including the tree SHA (and parent commit SHAs for non-root commits)) to a temporary file. You can just cat out this file to see it (commit date, author name, email, etc.).

The fact that cat-file writes to disk is a bit annoying, and so it learned to output to STDOUT in bf0c6e839c692142784caf07b523cd69442e57a5.
Repeat steps 2-7 above, but for step 5 pass in the -p flag to mark it as a child of a previous commit SHA. You can pass in multiple -p flags to denote multiple parents (e.g., a merge). For the very first merge in Git’s own history, see b51ad4314078298194d23d46e2b4473ffd32a88a.

Bresenham's Circle Drawing Algorithm

2021-03-15T00:00:00Z

2021-03-15
programming, math

Once upon a time I was given the following problem for a technical programming interview:

Write a function draw_circle(r) that draws a circle with radius r. Use the given method draw_pixel(x, y) which takes a 2-dimensional point (x, y) and colors it in on the computer screen.

For the solution, you can either collect all pixels (tuples) of $x$ and $y$ coordinate pairs, or just call draw_pixel() on them during the “search” for those pixels that must be filled in.

This post goes over several solutions, ultimately arriving at Bresenham’s algorithm. The content of this post is merely a distillation of Section 3.3 from the book “Computer Graphics: Principles and Practice (1996)”.¹ The authors of the book state that their implementation results in code “essentially the same as that specified in patent 4,371,933 [a.k.a. Bresenham’s algorithm].”²

I’ve gone all out and converted the “reference” implementations found in the book and translated them into Rust and Python. The Python was written first, and I used a text-based drawing system to test the correctness. However I became dissatisfied with the non-square “aspect ratio” of most monospaced fonts out there, which distorted the circles to look more like ellipses. To fix this, I decided to port the Python code to Rust, and then target WASM so that I can use it to draw on the HTML5 elements (and to eliminate the “aspect ratio” problem). All of the drawings in this document are powered by the Rust code.

Constraints

Drawable canvas

Before we start, let’s define the drawable surface (canvas) of pixels for this problem. The pixels are arranged in a 2-dimensional grid. The important thing here is the grid or coordinate system, with the pixel at the center of the grid having the traditional (0, 0) Cartesian coordinate.

Below is a sample grid to give you a sense of what this will look like. There is a central (0, 0) origin pixel, and 15 pixels to the north, south, east, and west, and everything in-between. Pixels that lie on interesting points of symmetry are highlighted in green.

Mathematical definitions

The exact definition of a circle (given infinite precision, as on the traditional Cartesian plane) centered at the origin is

\[ \begin{equation} \label{eq:circle} x^2 + y^2 = r^2. \end{equation} \]

This resembles the Pythagorean Theorem

\[ a^2 + b^2 = c^2, \]

for any right-angled triangle with sides $a$ and $b$ and hypotenuse $c$. The resemblance is not a coincidence, because an infinite number of such triangles exists within the top right quadrant of the plane (that is, Quadrant I³, or the part of the plane such that $x \geq 0$ and $y \geq 0$); in Quadrant I, for all points $(x,y)$ that make up this portion (or arc) of the circle, their radii is the same as the hypotenuses of these triangles (whose sides are $x$ and $y$). Later in this post, this will become relevant again when we discuss Pythagorean Triples.

Anyway, solving for $y$ in Equation $\ref{eq:circle}$ gives

\[ \begin{equation} \label{eq:circle-y} y = \pm\sqrt{r^2 - x^2} \end{equation} \]

to get 2 functions for the top-half and bottom-half of the circle (that’s what the $\pm$ symbol means). Consider the function $y = x$. This function has slope 1 and is a diagonal line where all values of $x = y$. Now consider how this line intersects the quarter-arc of the circle in Quadrant I. This intersection point evenly divides the arc into 2 halves, and is where

\[ x = y = \tfrac{r}{\sqrt{2}}, \]

or simply the point

\[ \begin{equation} (\tfrac{r}{\sqrt{2}}, \tfrac{r}{\sqrt{2}}). \end{equation} \]

This is because if $x = y$, then Equation $\ref{eq:circle}$ becomes

\[ \begin{align} x^2 + y^2 &= r^2 \\ x^2 + x^2 &= r^2 \\ 2x^2 &= r^2 \\ \tfrac{2x^2}{2} &= \tfrac{r^2}{2} \\ x^2 &= \tfrac{r^2}{2} \\ \sqrt{x^2} &= \tfrac{\sqrt{r^2}}{\sqrt{2}} \\ x &= \tfrac{r}{\sqrt{2}}. \label{eq:arc-intersection} \end{align} \]

This is not that interesting for purposes of the algorithms in this post, but is something that is glossed over in the book.

Symmetry

Because of symmetry, we can mirror the solution $(x,y)$ pairs we get in Quadrant I into the other quadrants. This gives us 4-way symmetry because there are 4 quadrants.

def mirror_points_4(x, y):
    """ Return 4-way symmetry of points. """
    return [( x,  y),
            (-x,  y),
            ( x, -y),
            (-x, -y)]

lib.py [GitHub] [Download]

Note, however, that there is actually 8-way symmetry at hand because (1) we can swap $x$ and $y$, and (2) because of the way we can distribute the negative sign:

#	Coordinate	Quadrant
1	`( x, y)`	I
2	`( y, x)`	I
3	`(-x, y)`	II
4	`(-y, x)`	II
6	`(-x,-y)`	III
5	`(-y,-x)`	III
7	`( x,-y)`	IV
8	`( y,-x)`	IV

def mirror_points_8(x, y):
    """ Return 8-way symmetry of points. """
    return [( x,  y),
            ( y,  x),
            (-x,  y),
            (-y,  x),
            ( x, -y),
            ( y, -x),
            (-x, -y),
            (-y, -x)]

lib.py [GitHub] [Download]

Fun fact: the exact point at which $x$ and $y$ get “swapped” in Quadrant I is when $x = y = \tfrac{r}{\sqrt{2}}$ (Equation $\ref{eq:arc-intersection}$).

Naive solutions

When in doubt, brute force is always a great answer, because at least it gets you started on something that works given enough time and/or memory.⁴ Because we already have clear mathematical definitions, we can just translate them (albeit mechanically) to code.

def get_circle_points_naive_4(r):
    """ Draw a circle by pairing up each Y value with an X value that lie on a
    circle with radius 'r'. This has a bug because some Y values get skipped.
    Can you see why?
    """
    points = []
    for x in range(r + 1):
        # isqrt() gets the integer square root.
        y = isqrt((r * r) - (x * x))
        points.extend(mirror_points_4(x, y))
    return points

naive.py [GitHub] [Download]

get_circle_points_naive_4() is the simplest translation, although there is a bug, which is obvious when we visualize it (in this case, for $r = 15$):

The get_circle_points_naive_4() is based on Equation $\ref{eq:circle-y}$. We iterate $x$ from $0$ to $r$ ⁵, and at each $x$ try to find the best value for $y$. The problem is that we’re only solving for 1 $y$ value for every $x$ value we increment by. As we get near the left and right sides of the circle, we need to calculate more than just 1 $y$ value for every $x$.⁶.

The get_circle_points_naive_8() function gets around this $y$-skip bug by invoking 8-way symmetry instead:

def get_circle_points_naive_8(r):
    """ Better than get_circle_points_naive_4, but wastes CPU cycles because
    the 8-way symmetry overcorrects and we draw some pixels more than once.
    """
    points = []
    for x in range(r + 1):
        y = isqrt((r * r) - (x * x))
        points.extend(mirror_points_8(x, y))
    return points

naive.py [GitHub] [Download]

However the downside is that it results in multiple points that will be drawn 2 times, wasting CPU cycles.⁷ To be more precise, all points around the gappy area in Quadrant I are redundant because that part of the arc is already mirrored nicely by the contiguous points from $x = 0$ to $x = y$.

The get_circle_points_naive_8_faster() function avoids drawing the gappy areas by just breaking the loop when $x > y$, but is otherwise the same:

def get_circle_points_naive_8_faster(r):
    """ Slightly faster than get_circle_points_naive_8, because of the break
    condition at the middle of the arc. However this is still inefficient due
    to the square root calculation with `isqrt()`.
    """
    points = []
    for x in range(r + 1):
        y = isqrt((r * r) - (x * x))
        # When we cross the middle of the arc, stop, because we're already
        # invoking 8-way symmetry.
        if x > y:
            break
        points.extend(mirror_points_8(x, y))
    return points

naive.py [GitHub] [Download]

This is the best we can do with the simple mathematical translations to code. Note that in all of these implementations we are still forced to calculate square roots in every iteration, which is certainly suboptimal.

Bresenham’s Algorithm

This as also known as the “Midpoint Circle Algorithm,” where the name “midpoint” comes from the mathematical calculations that are done by considering the midpoint between pixels. The gist of the algorithm is that instead of using Equation $\ref{eq:circle-y}$ to calculate $y$ for every $x$, instead you try to move along the arc of the circle, pixel-to-pixel, staying as close as possible to the true arc:

Start out from the top of the circle (color in pixel $(0, r)$). Note that because of symmetry, we could start out from $(0, -r)$, $(r, 0)$, or even $(-r, 0)$ as Bresenham did in his paper.⁸
Move right (east (E)) or down-right (southeast (SE)), whichever is closer to the circle.
Stop when $x = y$ (just like in get_circle_points_naive_8_faster()).

The hard part is Step 2, where we just need to figure out which direction to move (E or SE) from the current pixel. The brute force way here is to just calculate the distance away from the center of the circle for the E and SE pixels (using Euclidean distance, which is just a variation of Equation $\ref{eq:circle}$ or the Pythagorean Theorem), and just choose the pixel that is closest to the arc of the circle. This makes sense, but with the power of mathematics, we can do better.

Inside, on, or outside the circle?

In order to figure out whether some point $(x, y)$ is inside, on, or outside of the circle depends on the definition of the circle from Equation $\ref{eq:circle}$. We can tweak it in terms of any $(x, y)$ pair:

\[ \begin{equation} \label{eq:error-margin} F(x,y) = x^2 + y^2 - r^2 = \text{distance from true circle line}. \end{equation} \]

Note that if $F(x,y) = 0$, then the point $(x, y)$ is exactly on the circle. If $F(x,y) > 0$, then the point is outside of the circle, and if $F(x,y) < 0$ then the point is inside of it. In other words, given any point $(x, y)$, $F(x, y)$ is the distance from the true circle line.

Choosing between E or SE

Let’s remind ourselves that we’ll always be moving E or SE. One critical (pragmatic) property here is that we’re dealing with a pixel grid with integer increments. There is a very high chance that neither the E or SE pixels we’re moving to is exactly on the circle. This is because the only time that the point $(x,y)$ will exactly be on the line of the circle is if the $x$, $y$, and $r$ values (as integers) form a so-called Pythagorean Triple. For $r < 100$, there are only 50 such triples:

( 3, 4, 5)  (18,24,30)  (24,45,51)  (16,63,65)  (51,68,85)
( 6, 8,10)  (16,30,34)  (20,48,52)  (32,60,68)  (40,75,85)
( 5,12,13)  (21,28,35)  (28,45,53)  (42,56,70)  (36,77,85)
( 9,12,15)  (12,35,37)  (33,44,55)  (48,55,73)  (13,84,85)
( 8,15,17)  (15,36,39)  (40,42,58)  (24,70,74)  (60,63,87)
(12,16,20)  (24,32,40)  (36,48,60)  (45,60,75)  (39,80,89)
(15,20,25)  ( 9,40,41)  (11,60,61)  (21,72,75)  (54,72,90)
( 7,24,25)  (27,36,45)  (39,52,65)  (30,72,78)  (35,84,91)
(10,24,26)  (30,40,50)  (33,56,65)  (48,64,80)  (57,76,95)
(20,21,29)  (14,48,50)  (25,60,65)  (18,80,82)  (65,72,97)

In other words, for all practical purposes, there will always be some error and we’ll always be outside or inside the circle and never directly on it. It’s sort of like driving a car and trying to stay within your designated lane: if you think you’re moving too much to the right, you turn your wheel left to stay “within” the lane (or some acceptable amount within the lane), and vice versa.

The idea is the same for moving along the circle: if we think we’re moving too far outside the circle, we try to move into it. On the other hand, if we think we’re moving into the circle, we move out of it. And so imagine yourself standing on point $(0, r)$, our starting point. The line of the circle is our “lane” we want to stay “on” as much as possible. Choosing to go E is the same as turning “left”. Choosing to go SE is the same as turning “right”. Using this metaphor, if we were not to turn at all (go “straight”), we would be heading to the virtual “in-between” pixel between E and SE, the midpoint between them.

And so here’s the basic idea behind choosing E or SE:

If going “straight” would mean going into the circle (i.e., we’re currently veering too much to the right!), we course-correct by turning left (E).
Conversely, if going “straight” would mean going outside the circle (i.e., we’re currently veering too much to the left), we course-correct by turning right (SE).
Lastly, if going “straight” would mean staying exactly on the circle (we hit a Pythagorean Triple), we turn SE (from an engineering perspective it doesn’t really matter which way we turn in this case, as both E and SE result in some amount of error — although see “Final tweaks” below for a note on aesthetics).

Let’s convert this idea into pseudocode:

Let M be the midpoint (going "straight").

Then, F(M) tells us what direction we're headed relative to the true circle line.

If F(M) is < 0, we're moving "into" the circle (veering right), so turn left by moving E.

Otherwise move SE.

Note that we only have to calculate $F(...)$ for the midpoint $M$. Isn’t this cool? It is much better than calculating $F(E)$ and $F(SE)$ and having to compare them!

# This F() function is the same as the mathematical F(...) function
# discussed above (Equation 11).
def F(x, y, r):
    return (x * x) + (y * y) - (r * r)

def get_circle_points_bresenham_WIP1(r):
    points = []
    x = 0
    y = r
    # Calculate F(M) for the very first time. That is, if we were to go
    # "straight" from (0, r), would we be inside or outside the circle?
    xE, yE = (1, r)
    xSE, ySE = (1, r - 1)
    xM, yM = (1, r - 0.5)
    F_M = F(xM, yM, r)
    points.extend(mirror_points_8(x, y))
    while x < y:
        # If going straight would go "into" the circle (too much to the
        # right), try to move out of it by turning left by moving E.
        if F_M < 0:
            x += 1
            F_M = F(x, y, r)
        # Otherwise move SE.
        else:
            x += 1
            y -= 1
            F_M = F(x, y, r)
        points.extend(mirror_points_8(x, y))
    return points

We can refactor the above slightly. We can simplify the initial calculation of F_M to avoid calling F(), and also move out some of the redundant bits. The very first midpoint we have to consider is $(1, r - \tfrac{1}{2})$; plugging this into $F()$ gets us

\[ \begin{align} F(1, r - \tfrac{1}{2}) &= 1^2 + (r - \tfrac{1}{2})^2 - r^2 \\ &= 1 + (r^2 - r + \tfrac{1}{4}) - r^2 \\ &= 1 + r^2 - r^2 - r + \tfrac{1}{4} \\ &= 1 - r + \tfrac{1}{4} \\ &= \tfrac{5}{4} - r. \end{align} \]

With that said, we can get this:

def get_circle_points_bresenham_WIP2(r):
    points = []
    x = 0
    y = r
    F_M = 5/4 - r
    points.extend(mirror_points_8(x, y))
    while x < y:
        # If going straight would go "into" the circle (too much to the
        # right), try to move out of it by turning left by moving E.
        if F_M < 0:
            pass
        # Otherwise move SE.
        else:
            y -= 1
        x += 1
        F_M = F(x, y, r)
        points.extend(mirror_points_8(x, y))
    return points

The annoying bit is the call to F(). Surprisingly, the call to F() can be elimitated entirely, because we can calculate it once, and then merely adjust it thereafter.

Calculate once, adjust thereafter

We can just calculate $F(x,y)$ once when we start out at $(0, r)$, and then just adjust it depending on whether we move E or SE. The key is that this “adjustment” computation is cheaper than calculating the full $F(x,y)$ distance function all over again.

Let $M$ be the midpoint $(x + 1, y - \tfrac{1}{2})$ between the E $(x + 1, y)$ and SE $(x + 1, y - 1)$ pixels. Then $F(M)$ is the result of going “straight” and tells us the direction we’re veering off from the circle line:

\[ \begin{equation} F(M) = F(x + 1, y - \tfrac{1}{2}) = (x + 1)^2 + (y - \tfrac{1}{2})^2 - r^2. \end{equation} \]

The values for $x$ and $y$ are unknown, however they change in only 2 possible ways — by moving E or SE!

If we move E, then $M$ will change from $(x + 1, y - \tfrac{1}{2})$ to $(x + 2, y - \tfrac{1}{2})$ because we add 1 to $x$ to move 1 pixel east; the new value of $F(M)$ at this pixel, which we can call $F(M_E)$, will then be:

\[ \begin{equation} F(M_{E}) = F(x + 2, y - \tfrac{1}{2}) = (x + 2)^2 + (y - \tfrac{1}{2})^2 - r^2. \end{equation} \]

Now we can take the difference between these two full calculations. That is, if we were to move E, how would $F(M)$ change? Simple, we just look at the change in $x$ ($\Delta_{x}$) (we don’t care about the change in $y$ or $r$, because they stay constant in this case).

\[ \begin{align} \Delta_{E} &= F(M_{E}) - F(M) \\ &= [(x + 2)^2 + (y - \tfrac{1}{2})^2 - r^2] - [(x + 1)^2 + (y - \tfrac{1}{2})^2 - r^2] \\ &= \Delta_{x} \\ &= (x + 2)^2 - (x + 1)^2 \label{eq:de1} \\ &= (x^2 + 4x + 4) - (x^2 + 2x + 1) \\ &= x^2 + 4x + 4 - x^2 - 2x - 1 \\ &= x^2 - x^2 + 4x - 2x + 4 - 1 \\ &= 2x + 3. \label{eq:de2} \end{align} \]

So $F(M)$ will change by $2x + 3$ if we move E. So at any given point, if we move E, $F(M)$ will always change by $2x + 3$.

How about for moving SE? If we move SE, the new value of $M$ will change from $(x + 1, y - \tfrac{1}{2})$ to $(x + 2, y - \tfrac{3}{2})$ because we add 1 to $x$ and subtract 1 from $y$ to move 1 pixel southeast; the new value of $F(M)$ for this case, which we call $F(M_{SE})$, will then be:

\[ \begin{equation} F(M_{SE}) = F(x + 2, y - \tfrac{3}{2}) = (x + 2)^2 + (y - \tfrac{3}{2})^2 - r^2. \end{equation} \]

We can do the same difference analysis here, but with the addition that we have to consider the change in $y$ ($\Delta_{y}$) as well (because of the 1 we subtracted from $y$):

\[ \begin{align} \Delta_{SE} &= F(M_{SE}) - F(M) \\ &= [(x + 2)^2 + (y - \tfrac{3}{2})^2 - r^2] - [(x + 1)^2 + (y - \tfrac{1}{2})^2 - r^2] \\ &= \Delta_{x} + \Delta_{y} \\ &= [(x + 2)^2 - (x + 1)^2] + [(y - \tfrac{3}{2})^2 - (y - \tfrac{1}{2})^2] \\ &= (2x + 3) + [(y^2 - \tfrac{6y}{2} + \tfrac{9}{4}) - (y^2 - y + \tfrac{1}{4})] \\ &= (2x + 3) + (y^2 - 3y + \tfrac{9}{4} - y^2 + y - \tfrac{1}{4}) \\ &= (2x + 3) + (y^2 - y^2 - 3y + y + \tfrac{9}{4} - \tfrac{1}{4}) \\ &= (2x + 3) + (- 2y + \tfrac{8}{4}) \\ &= (2x + 3) + (-2y + 2) \\ &= 2x + 3 - 2y + 2 \\ &= 2x - 2y + 5 \\ &= 2(x - y) + 5. \label{eq:se1} \end{align} \]

And so when moving SE, the new $F(M)$ must change by $2(x - y) + 5$.

Now we have all the pieces to derive the complete algorithm!

def get_circle_points_bresenham_float_ese(r):
    """ Draw a circle using a floating point variable, F_M. Draw by moving E or
    SE."""
    points = []
    x = 0
    y = r
    # F_M is a float.
    F_M = 5 / 4 - r
    points.extend(mirror_points_8(x, y))
    while x < y:
        if F_M < 0:
            F_M += 2.0 * x + 3.0
        else:
            F_M += 2.0 * (x - y) + 5.0
            y -= 1
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

bresenham.py [GitHub] [Download]

Integer-only optimization

The initial value of F_M ($F(M)$) is $\tfrac{5}{4} - r$. Notice how this is the only place where we have to perform division in the whole algorithm. We can avoid this initial division (and subsequent floating point arithmetic) by initializing it to $1 - r$ instead, which is a difference of $\tfrac{1}{4}$ vs the original.

Because we tweaked the initialization by $\tfrac{1}{4}$, we have to do the same for all comparisons of $F(M)$ moving forward. That is, the comparison $F(M) < 0$ actually becomes $F(M) < -\tfrac{1}{4}$. However, this fractional comparison is unnecessary because we only deal with integer increments and decrements in the rest of the code, so we can just keep the same $F(M) < 0$ as before. In other words, our algorithm only cares about whole numbers, so worrying about this extra $\tfrac{1}{4}$ difference is meaningless.

def get_circle_points_bresenham_integer_ese(r):
    """ Like draw_circle_bresenham_float_ese, but F_M is an integer variable.
    """
    points = []
    x = 0
    y = r
    # F_M is an integer!
    F_M = 1 - r
    points.extend(mirror_points_8(x, y))
    while x < y:
        if F_M < 0:
            # We can use a bit-shift safely because 2*n is the same as n << 1
            # in binary, and also because F_M is an integer.
            F_M += (x << 1) + 3
        else:
            F_M += ((x - y) << 1) + 5
            y -= 1
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

bresenham.py [GitHub] [Download]

Second-order differences

There is a final optimization we can do.⁹ In the “Calculate once, adjust thereafter” section we avoided calculating $F(M)$ from scratch on every iteration. We can do the same thing for the differences themselves!

That is, we can avoid calculating $\Delta_{E} = (2x + 3)$ and $\Delta_{SE} = 2(x - y) + 5$ on every iteration, and instead just calculate them once and make adjustments to them, just like we did earlier for $F(M)$.

Let’s first consider how $\Delta_{E} = 2x + 3$ changes. First, we initialize $\Delta_{E}$ by plugging in $(0, r)$ into Equation $\ref{eq:de2}$, our starting point. Because there is no $y$ variable in here, we get an initial value of

\[ \begin{equation} \label{eq:de-2ord-initial} 2(0) + 3 = 3. \end{equation} \]

If we go E, $\Delta_{E}$ changes like this: \[ \begin{align} \Delta_{E_{new}} = \Delta_{E_(x+1,y)} - \Delta_{E_(x,y)} &= [2(x+1) + 3] - (2x + 3) \label{eq:de-2ord-e} \\ &= 2x + 2 + 3 - 2x - 3 \\ &= 2x - 2x + 3 - 3 + 2 \\ &= 2. \label{eq:e2ord} \end{align} \]

If we go SE, $\Delta_{E}$ changes in the exact same way, because even though our new point is at $(x+1, y-1)$, there is no $y$ in $\Delta_{E} = 2x + 3$, so it doesn’t matter and $\Delta_{E_{new}} = 2$ again.

Now let’s consider how $\Delta_{SE}$ changes. For the initial value, we again plug in $(0, r)$ into $2(x-y) + 5$, to get

\[ \begin{equation} \label{eq:dse-2ord-initial} 2(0-r) + 5 = -2r + 5. \end{equation} \]

If we go E, $\Delta_{SE}$ changes like this:

\[ \begin{align} \Delta_{SE_{new}} = \Delta_{SE_(x+1,y)} - \Delta_{SE_(x,y)} &= [2((x + 1)-y) + 5] - [2(x - y) + 5] \label{eq:dse-2ord-e} \\ &= (2x + 2 - 2y + 5) - (2x - 2y + 5) \\ &= 2x - 2y + 7 - 2x + 2y - 5 \\ &= 2x - 2x + 2y - 2y + 7 - 5 \\ &= 2. \label{eq:se2ord1} \end{align} \]

If we go SE, $\Delta_{SE}$ changes like this:

\[ \begin{align} \Delta_{SE_{new}} = \Delta_{SE_(x+1,y-1)} - \Delta_{SE_(x,y)} &= [2((x + 1)-(y - 1)) + 5] - [2(x - y) + 5] \label{eq:dse-2ord-se} \\ &= [2(x + 1 - y + 1) + 5] - (2x - 2y + 5) \\ &= (2x + 2 - 2y + 2 + 5) - 2x + 2y - 5 \\ &= 2x- 2x + 2y - 2y + 5 - 5 + 2 + 2 \\ &= 2 + 2 \\ &= 4. \label{eq:se2ord2} \end{align} \]

The code should then look like this:

def get_circle_points_bresenham_2order(r):
    points = []
    x = 0
    y = r
    F_M = 1 - r
    d_e = 3 # Equation 40
    d_se = -(2 * r) + 5 # Equation 45
    points.extend(mirror_points_8(x, y))
    while x < y:
        if F_M < 0:
            F_M += d_e
            d_e += 2  # Equation 44
            d_se += 2 # Equation 50
        else:
            F_M += d_se
            d_e += 2  # Equation 44
            d_se += 4 # Equation 56
            y -= 1
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

With a little refactoring, we can arrive at a slightly simpler version:

def get_circle_points_bresenham_integer_ese_2order(r):
    """ Like draw_circle_bresenham_integer_ese, but use 2nd-order differences
    to remove multiplication from the inner loop. """
    points = []
    x = 0
    y = r
    F_M = 1 - r
    # Initial value for (0,r) for 2x + 3 = 0x + 3 = 3.
    d_e = 3
    # Initial value for (0,r) for 2(x - y) + 5 = 0 - 2y + 5 = -2y + 5.
    d_se = -(r << 1) + 5
    points.extend(mirror_points_8(x, y))
    while x < y:
        if F_M < 0:
            F_M += d_e
        else:
            F_M += d_se
            # Increment d_se by 2 (total 4) if we go southeast.
            d_se += 2
            y -= 1
        # Always increment d_e and d_se by 2!
        d_e += 2
        d_se += 2
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

bresenham.py [GitHub] [Download]

The “purist” in me felt that the decrementing of $y$ stood out like a sore thumb, and so I created a tweaked version that moves E and NE, starting out from $(0, -r)$ instead. The mathematical techniques are the same, and due to symmetry the behavior of the algorithm does not change.

def get_circle_points_bresenham_integer_ene_2order(r):
    """ Like draw_circle_bresenham_integer_ene, but start from (0, -r) and move
    E or NE. Notice how we only need the addition instruction in the while loop
    (y is incremented, not decremented). """
    points = []
    x = 0
    y = -r
    F_M = 1 - r
    # Initial value for (0,-r) for 2x + 3 = 0x + 3 = 3.
    d_e = 3
    # Initial value for (0,-r) for 2(x + y) + 5 = 0 - 2y + 5 = -2y + 5.
    d_ne = -(r << 1) + 5
    points.extend(mirror_points_8(x, y))
    while x < -y:
        if F_M < 0:
            F_M += d_e
        else:
            F_M += d_ne
            d_ne += 2
            y += 1
        d_e += 2
        d_ne += 2
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

bresenham.py [GitHub] [Download]

Here are a couple drawings using Bresenham’s algorithm. This one is for $r = 15$:

And for $r = 60$:

Comparisons vs naive algorithm

Here are some side-by-side comparisons for $0 \leq r \leq 10$.

Radius	Naive	Bresenham
0
1
2
3
4
5
6
7
8
9
10

Final tweaks

It has been kindly pointed out that the naive algorithm is aesthetically more pleasing if the calculations involving $r$ is done with $r + \tfrac{1}{2}$ instead of just $r$ itself, like this:

def get_circle_points_naive_8_faster_tweaked_radius(r):
    """ This is much closer to Bresenham's algorithm aesthetically, by simply
    using 'r + 0.5' for the square root calculation instead of 'r' directly.
    """
    points = []
    # In the square root calculation, we just use (r + 0.5) instead of just r.
    # This is more pleasing to the eye and makes the lines a bit smoother.
    r_tweaked = r + 0.5
    for x in range(r + 1):
        y = sqrt((r_tweaked * r_tweaked) - (x * x))
        if x > y:
            break
        points.extend(mirror_points_8(x, floor(y)))
    return points

naive.py [GitHub] [Download]

Indeed, the small tweak seems to do wonders to the output for low values of $r$.

At the same time, there is a tweak we can do as well for the Bresenham algorithm. Instead of turning E (“left”, or away from the circle) when $F(M) < 0$, we can do so when $F(M) \leq 0$.

def get_circle_points_bresenham_integer_ene_2order_leq(r):
    """ Like draw_circle_bresenham_integer_ene_2order, but use 'f_m <= 0'
    instead of 'f_m < 0'.
    """
    points = []
    x = 0
    y = -r
    F_M = 1 - r
    d_e = 3
    d_ne = -(r << 1) + 5
    points.extend(mirror_points_8(x, y))
    while x < -y:
        if F_M <= 0:
            F_M += d_e
        else:
            F_M += d_ne
            d_ne += 2
            y += 1
        d_e += 2
        d_ne += 2
        x += 1
        points.extend(mirror_points_8(x, y))
    return points

bresenham.py [GitHub] [Download]

This makes us turn “left” slightly more often, and intuitively, should give us a slightly larger circle.

Anyway, see for yourself how the tweaks play out for $0 \leq r \leq 10$:

Radius	Naive	Naive (tweaked radius)	Bresenham	Bresenham (tweaked conditional)
0
1
2
3
4
5
6
7
8
9
10

It appears to me that the most aesthetically pleasing algorithm is the tweaked version of the Bresenham algorithm.¹⁰ When given equally bad choices (the case where $F(M) = 0$), this version draws a pixel away from the origin by choosing to go E, thereby drawing a slightly bigger circle. You can see this play out in the above table for when $r = 6$ and especially $r = 1$. It’s a bit unfortunate that the authors of the book did not choose this version, as it seems to do a better job for small values of $r$.

We can carry over the same intuition over to the tweak to increase $r$ by $\tfrac{1}{2}$ for the naive algorithm — increasing $r$ should result in a larger value of $y$, thereby resulting in drawing a larger circle (and in the process improving the aesthetics). Neat!

Conclusion

To me, Bresenham’s algorithm is interesting because it does not try to be “perfect”. Instead it merely does its best to reduce the amount of error, and in doing so, gets the job done remarkably well.

The technique of avoiding the full polynomial calculation behind $F(M)$ (referred by the book as finding the first and second-order differences) took some time to get used to, but is intuitive enough in the end. You just need to consider differences in terms of variables. There’s also a connection to calculus because we’re dealing in terms of differences to “cut down” on the polynomial degrees — we go from the squares in Equation $\ref{eq:circle}$ to just linear functions in Equations $\ref{eq:de2}$ and $\ref{eq:se1}$, and again go one more step to just constant functions in Equations $\ref{eq:e2ord}$, $\ref{eq:se2ord1}$, and $\ref{eq:se2ord2}$.

I hope you learned something!

Happy hacking!

Foley, J. D., van Dam, A., Feiner, S. K., Hughes, J. F. (1996). Basic Raster Graphics Algorithms for Drawing 2D Primitives, Scan Converting Circles. Computer Graphics: Principles and Practice (pp. 81–87). Addison-Wesley. ISBN: 0201848406↩︎
Bresenham, J.E., D.G. Grice, and S.C. Pi, “Bi-Directional Display of Circular Arcs,” US Patent 4,371,933. February 1, 1983. Note: unfortunately, trying to understand the original text of the patent is perhaps equally as difficult as inventing the algorithm on your own from scratch. Hence this blog post.↩︎
There are 4 such quadrants: I, II, III, and IV.↩︎
In some sense, all great algorithms are mere optimizations of brute force approaches.↩︎
In code, we have to write range(r + 1) because the range() function does not include the last integer. Such “fence-post” or “off by one” logic is the bane of computer programmers.↩︎
Mathematically, this is because the slope of the arc in Equation $\ref{eq:circle-y}$ approach positive and negative infinity around these areas.↩︎
In the Rust WASM implementation that is used for the graphics in this blog post, we actually use a bitmap such that we only draw a particular pixel just once. However, we still end up setting the a pixel as “on” more than once.↩︎
Bresenham, Jack. “A Linear Algorithm for Incremental Digital Display of Circular Arcs.” Communications of the ACM, vol. 20, no. 2, 1977, pp. 100–106., doi:10.1145/359423.359432.↩︎
It is not clear to me if this change runs faster on modern CPUs, because I recall reading that multiplication can sometimes be faster than adding. But it’s still interesting.↩︎
This version looks slightly better than the tweaked naive one for $r = 8$.↩︎

Using MPD for ReplayGain

2021-01-05T00:00:00Z

2021-01-05
linux, audio

Something like ~10 years ago, there was no easy way to apply ReplayGain to various audio files with different formats (e.g., flac vs mp3). Over the holiday break I discovered r128gain which is exactly the tool I wanted for many years. You just run

r128gain -r

and it will recursively tag all music files with ReplayGain information — in parallel, no less!

The only downside is that neither cmus nor mpv currently support the R128_TRACK_GAIN tag that r128gain generates (at least for *.opus files).¹ However, I discovered that MPD (Music Player Daemon) supports R128_TRACK_GAIN.² MPD is easy to start up and the basic minimal configuration was small enough:

music_directory     "~/Music"
# Automatically prune new/deleted files in the music_directory.
auto_update         "yes"

# Allow saving playlists from vimpc.
playlist_directory  "~/Music/playlists"

audio_output {
    type            "pulse"
    name            "My Pulse Output"
}

# Enable replay gain.
replaygain          "track"

As far as actually controlling MPD, I settled on vimpc — because Vi-like keybindings are too good to pass up.

Cheers!

To be precise, cmus has a commit in master that adds support, and mpv has an open issue for it. And I’m too lazy to compile cmus from source.↩︎
I had actually used MPD back in the day, but switched to cmus because it was simpler. And because cross-format ReplayGain tagging software was not available, MPD’s support for ReplayGain wasn’t very meaningful for me.↩︎

Thoughts on Baduk

2021-01-04T00:00:00Z

2021-01-04
baduk

My history with Baduk

I’ve started playing Baduk more during the pandemic, and I thought I should write about this fascinating game.¹

When I was a child growing up in Korea, I remember seeing newspapers with pictures of Baduk positions with professional commentary. Unfortunately, no one in my immediate family had any interest in the game, so my initial curiosity of the game came and went.²

Later in the United States, I picked up chess during high school. Chess had the advantage that there were many more people who already know how to play the game. And also, carrying around a lightweight board with some pieces wasn’t difficult at all. I recall whipping out my chess board for a quick game during lunch, recess, and any other time I could find an opponent.

I took a step back from chess during college and later years. I started to lose interest after I kept playing the same openings.

Around 2016 I re-learned how to play Baduk. One reason I picked it up again was that at this time, Baduk was still played best by humans (a computer AI had not yet defeated a master human player on an even game). This was just before the great “AlphaGo” match with Lee Sedol, and at the time most people believed that AI supremacy in this game was still another decade away.

I started playing and losing a ton of games against the AI on 9x9 at online-go.com, especially around 2019 when sometimes I played marathon 9x9 games, over and over against the computer. The most memorable game from this period is this one where I won by 0.5 points against a 7-stone handicap. That victory was a pleasant surprise, but it also left me with a sense of obligation to study the game with a little more seriousness.

After a brief pause, I returned to the game in the fall of 2020. During this time I watched some videos from this YouTube channel, which I was able to roughly understand thanks to my knowledge of Korean. I began to realize large gaps in my style of play, and it was only after this realization that I started improving my results.

Baduk vs. Chess

Having a working knowledge of both chess and Baduk, I would have to say that the biggest difference between them for me is that there is far, far more room for strategy in Baduk. This is because you can ignore threats and play for bigger moves on the more much more often than in chess, especially in the opening and middlegame. There is more wiggle room for creativity!

Speaking of openings, because the Baduk board has 4 symmetrical corners, there are actually 4 areas of openings in each game. You can have 4 different “openings” in each game. In chess there is only 1 “center” of the board where the vast majority of opening theory takes place.

The handicap system is far more elegant in Baduk as well. In chess, the handicap is usually to remove a pawn (or two), but this drastically alters the nature of the game. Not so for Baduk! The weaker player gets up to 9 extra stones on the board before the start of the game. This way you can play with opponents at different levels without getting completely crushed from the very beginning.

Perhaps the best part of the game is that each game is decided by a score (where the score is the amount of “territory” you control). A win is technically a win, sure, but the “wins” can be judged against each other by their score.

Finally, the game is more forgiving in terms of errors. In chess if you lose your queen (without adequate compensation), the game is pretty much over. In baduk, even if you lose a sizable group, you can still come back. Actually, the bigger your group of stones, the harder it is to get them captured outright, and so there is a natural, automatic tendency for your strongest “pieces”, if you will, to resist capture.³ Brilliant!

Conclusion

If you haven’t learned the game yet, I strongly recommend this game! I used the book “Go: A Complete Introduction to the Game” to get a nice overview.⁴

Have fun!

I use the Korean word “Baduk” (바둑) because the usual Japanese loanword “Go” overlaps with the name “Go” of the Go programming language.↩︎
Years later I learned that my uncle is an amateur 5-dan.↩︎
In Baduk as long as a group gets 2 “eyes”, it becomes uncapturable — and the bigger the group, the easier it is to make such eyes.↩︎
The author of this book is Cho Chikun, one of the top players of the 20th century.↩︎

The Two Sum Problem Explained

2020-12-05T00:00:00Z

2020-12-05
algorithms, math

Just over three years ago, I watched this video that goes over the so-called “Two Sum” problem for the first time. The problem statement is as follows:

Given a sorted list of integers (unimaginitively called numbers), determine if any 2 integers in the list sum up to a number N.

To be honest I did not understand why the proposed optimal solution that uses 2 pointers works the way it does, without missing any possible pairs. The explanation given by the interview candidate in the video always struck me as way too hand-wavy for my tastes.

And to be really honest I never bothered to convince myself that the 2-pointer approach is correct. Until today. This post is about the correctness behind the 2-pointer method, because I have yet to see a clear explanation about this topic.

Brute force approach

First let’s look at the brute-force solution. The brute-force solution looks at every single possible pair of numbers by using a double-for-loop. This is a very common pattern (nested looping) whenever one wants to consider all possible combinations, where each for-loop corresponds to a single “dimension” we want to exhaustively pore over. In this case there are 2 dimensions because there are 2 numbers we need to look at, so we must use 2 for-loops.

Here is the basic pseudocode ¹:

for i in numbers:
  for j in numbers:
    if i + j == N:
      return i, j

I think even beginner programmers can see that the brute force approach works. We just look at every possible 2-number combination to see if they will add up to N, and if so, we stop the search. Simple! ²

2-pointer method

The 2-pointer method boils down to the following observation:

Remove numbers from the pairwise search if they cannot be used (with any other number) to sum up to N.

Although the solution posted in countless places online involve pointers, it is more intuitive to think of modifying the list after each pairwise inspection. Below is the algorithm in plain English:

Construct a pair of numbers (a, b) such that a is the smallest number and b is the biggest number in the list. That is, these are the leftmost and rightmost ends of the sorted list, respectively.
If the sum of a + b is equal to N, of course we’re done.
If the sum of a + b is bigger than N, delete b from the list. Go back to Step 1.
If the sum of a + b is smaller than N, delete a from the list. Go back to Step 1.
If the list becomes smaller than 2 elements, stop (obviously, because there are no more pairs to consider). Optionally return an error.

The algorithm described above can be equivalently coded with pointers, so there is no material difference to discuss in terms of implementation.

Anyway, we just need to make sense of the critical Steps, namely Steps 3, 4, and 5, and that should be enough to quell any worries about correctness.

Step 3

This is the step that removes the largest element b in the list from consideration for all future iterations. How can this be correct?

Well, let’s consider an example. If N is 50 but a + b is 85, we must look for a smaller sum. This much is obvious.

We just used a and b to get to 85, but because we must get to a smaller sum, we would like to swap out either a or b (or both, eventually) with another number from the list. The question is, which one do we swap out?

We can’t replace a with the next bigger number (or any other number between a and b), because doing so will result in a sum that is at least as big as 85 (or bigger). So a has to stay — we can’t rule out other combinations of a with some number other than b (maybe a and its next-biggest neighbor, etc).

That leaves us with b. We throw out b and replace it with the next biggest number, which is guaranteed to be less than or equal to the just-thrown-out b, because the list is sorted.

In other words, all pairs of b and every other element in the list already sums up to 85 or some other higher number. So b is a red herring that’s leading us astray. We must throw it out.

Step 4

This is the “mirror” of Step 3. Here we throw out the smallest number out of future pairwise searches, because we know that a, no matter which number it is paired with (even with the biggest one, b), is too small to meet the target N. In other words, a fails to give enough of a “boost” to any other number to reach N. It is very much useless to the remaining other candidates, and so we throw it out.

Step 5

This Step’s analogy when using pointers is to consider the condition when the pointers “cross”. The pointers “crossing”, in and of itself, doesn’t seem particularly significant. However when we view this condition by looking at the dwindling size of the overall list (by chopping off the left and right ends in Steps 4 and 3), the point becomes obvious. We must stop when the list becomes too small to make Step 1 impossible to fulfill (namely, the construction of the pair (a, b)), due to the fact that there aren’t enough elements in the list (singleton or empty list).

2-pointer method, in pseudocode

For sake of completeness, here is the pseudocode for the same algorithm. You will see how using pointers (instead of deleting elements outright as described in Steps 3 and 4) doesn’t change the algorithm at all.

# Partial implementation of Step 5. Early exit if list is too small to begin with.
if length(numbers) < 2:
  return error

# Step 1.
a_idx = 0
b_idx = length(numbers) - 1
sum = numbers[a_idx] + numbers[b_idx]

# Begin search, but only if we have to search.
while sum != N:
  # Step 3
  if sum > N:
    b_idx -= 1
  # Step 4
  elif sum < N:
    a_idx += 1

  # Step 5
  if a_idx == b_idx:
    return error

  # Step 1 (again, because we didn't find a match above).
  sum = numbers[a_idx] + numbers[b_idx]

# Step 2
return numbers[a_idx], numbers[b_idx]

It may be of interest to readers who are fairly new to programming that Step 2 comes in at the very end. Getting the “feel” for converting plain-English algorithms into actual code is something that requires experience, and can only be acquired with practice over time.

Do the pointers ever skip over each other?

It is worth pointing out that the condition a_idx == b_idx is well-formed. That is, there will never be a case where a_idx and b_idx will somehow “skip over” each other, rendering the if-condition useless. This is because we only ever increment a_idx or decrement b_idx, exclusively — that is, we never modify both of them within the same iteration. So, the variables only ever change by ±1, and at some point, if the search goes on long enough, the indices are bound to converge at the same numerical value.

Conclusion

I think the beauty of this problem is that it’s so simple, and yet it is also a very cool way of looking at the problem of search. Steps 3 and 4 are essentially very aggressive (and correct!) eliminations of bad search paths. There’s just something refreshing about eliminating entire branches of a search tree to speed things up.

If you compare the 2-pointer method with the brute force approach, it is in essence doing the same logical thing, with fewer steps. Whereas the brute force approach performs a pairwise comparison across all possible combinations, the 2-pointer method preemptively discards many combinations by removing elements outright from future consideration. That’s the kind of power you need to go from $O(n^2)$ to $O(n)$!

Hope this helps!

Of course, this pseudocode ignores edge-cases, but I didn’t want to clutter the code listing with non-essential ideas.↩︎
As an added benefit, the brute-force approach works even if the input list is not sorted.↩︎

The Esrille Nisse: Three Years Later

2019-11-13T00:00:00Z

2019-11-13
hardware, esrille, cherry mx

Over three years ago, I wrote a post describing the Esrille Nisse keyboard. This post is a reflection on the keyboard, more than 3 years later.

Layout

Ultimately I settled on a different layout than the one described in the old blog post. This was a result of many hands-on trial-and-error sessions over a period of weeks which turned into many months. In my old post I described writing a program to help find the optimal layout. This proved very difficult in practice, because encoding the optimization rules turned out to be non-trivial. One aspect that was particularly difficult was that the actualy physical shape of my own fingers played a part (some fingers were not as versatile as others, for example the pinky finger, and so the key-distance for certain fingers had to have different “weights”, and this was too much to translate into code).

Anyway, I read this post by Peter Norvig forwards and backwards, and used the values there to guide the design of my layout. One big realization after actual usage was that I could not let go of the QWERTY hjkl keys on the home row. There was just so much muscle memory built into these four keys (the only other key I could not let go of was the spacebar key that I used my left thumb for), that I had to “fix” them on the layout first. I then focused on getting the commonly-used keys right.

All that being said, here is my current layout.

      LEFT-SIDE     RIGHT-SIDE
    ---------------------------
    □ □ □ □ □ □     □ □ □ □ □ □
    □ □ 0 □ □         □ □ 0 □ 1
□ □ □ y o p z 1     2 f d t r □ □ □
2 / a i e u w ;     " h j k l n 4 : <--- Home row
  3 . x q v '         b m g c s 3
      8 5 6 7 4     5 , 6 7 8 <--------- Thumb row


Left-side legend
0) Escape
1) PgDn
2) Enter
3) Shift
4) Control
5) Super (Windows key)
6) Space
7) Caps Lock (remapped with xmodmap to Hyper key)
8) Right Alt (aka "AltGr" for US International Layout)

Right-side legend
0) Tab
1) Delete
2) PgUp
3) Shift
4) Backspace
5) FN2
6) FN
7) Alt
8) Right Alt (aka "AltGr" for US International Layout)

The main thing to note is the reduced number of keys that are mapped at all. I like this aspect a lot (not having to move my fingers around much at all) — I never have to reach for a particular key because everything is just so close.

I also dedicated a key just for the colon symbol (as a “Shift + semicolon” macro), because it comes up often enough in programming.

I should also note that the function keys (F1-F12) are situated on the topmost row, left-to-right. I just didn’t bother adding them to the legend because of symbol space constraints.

FN layer.

      LEFT-SIDE     RIGHT-SIDE
    ---------------------------
    □ □ □ □ □ □     □ □ □ □ □ □
    □ □ a □ □         □ □ □ □ □
□ □ □ 7 8 9 □ □     □ □ \ _ = □ □ □
□ □ 0 4 5 6 □ b     b - { ( ) } a : <--- Home row
  c . 1 2 3 `         □ [ < > ] c
      □ □ □ □ □     □ □ □ □ □ <--------- Thumb row

Left-side legend
a) ~/ (a macro that inserts the string "~/")
b) End
c) Shift

Right-side legend
a) Backspace
b) Home
c) Shift

The FN layer has the majority of the punctuation keys I need. You might notice that some symbols like !@#$&*^ are not in here. This is because the numeral keys on the left side are actually the same numeral keys on the top row (not the Numpad) of a typical QWERTY layout. This means that I can just press FN+Shift to get these keys. This is the main trick that allowed me to reduce the number of keys used overall.

The “~/” macro in the left side is particularly useful as well.

FN2 layer.

      LEFT-SIDE     RIGHT-SIDE
    ---------------------------
    □ □ □ □ □ □     □ □ □ □ □ □
    □ □ □ □ □         □ □ □ □ □
□ □ □ □ □ □ □ □     □ □ □ □ □ □ □ □
□ □ □ □ □ □ □ □     □ a b c d □ □ □ <--- Home row
  □ □ □ □ □ □         □ □ □ □ e □
      □ □ □ □ □     □ □ □ □ □ <--------- Thumb row


Right-side legend
a) Left Arrow
b) Down Arrow
c) Up Arrow
d) Right Arrow
e) Shift + Insert macro (for pasting X primary buffer)

This layer is mostly empty, but it is used surprisingly frequently. I really like how the arrow keys line up with my hjkl keys in the main layer.

For the latest changes to my layout, check out this repo.

Typing Speed

It took me roughly 3 months of everyday use to get somewhat familiar with the layout, and probably another month or two to reach upwards of 60wpm.

It was painstakingly slow at first (it felt a lot like learning how to type all over again), but still “fun” because I noticed that I was getting better with time.

I think these days (after having used this keyboard every day for both my home and work PCs (yes, I have two of these!)) I can go higher than 60wpm.

Suffice it to say that there is never a time when I think “oh, I wish I could type faster” on this layout. My speed on this keyboard is about on par as with my old typing speed on QWERTY.

My typing speed on the old QWERTY layout hasn’t really changed. I still have to use it for when I use the keyboard on laptops. And surprisingly, my brain knows to “switch” to QWERTY when I’m typing on there — granted, this instinct took some time to kick in.

Was it worth it?

Totally!

The biggest thing I love about this layout is that I don’t have to move my right hand around when reaching for the typical “hard” keys on QWERTY (such as `[]{}`). I rarely (if ever) have typos when typing punctuation keys. The numeral keys being just underneath my left hand in a different layer is nice, too.

There are some “downsides” though in everyday life:

it’s hard to play games because the key mappings are usually designed for QWERTY;
when I make typos using this layout, they look rather unusual from a “QWERTY” perspective (as a contrived example, I might type “yen” instead of “yes” because the “n” and “s” keys are next to each other on my layout)

I don’t really play games that much though, and when I do I am usually on the separate gaming PC that just use a regular QWERTY layout so it’s not really a negative.

I guess the biggest downside of all is that the keyboard form factor on the Nisse is one-of-a-kind on the planet. If Esrille goes under, I would be worried about taking very good care of my keyboards in case one of the components breaks for whatever reason. I imagine that at that point, I would have to just create my own keyboard or make do with a shabby imitation using ErgoDox or some other form factor. I sincerely hope that that day never comes…!

Happy hacking!

Status Update

2019-11-11T00:00:00Z

2019-11-11
status

It’s been another year since my last blog post!

On work

I’m still happily employed at Google.

Git Book

I’m still working on this project. It is hard because I’m trying to create real repositories and situations that the reader should be able to check out and follow along. Still, I should be able to finish this next year.

Programming Languages

I’ve started dabbling in Clojure, Rust, and Elixir. This list is not the same list of programming languages in my last post (which mentioned Shen, Rust, Erlang, Idris, and Factor), but it’s something to chew on for quite some time.

Future

I will post again when the Git book is ready. Stay tuned!

Status Update

2018-11-11T00:00:00Z

2018-11-11
status

It’s been a year since my last blog post. Many things have happened since that time.

Below are some of the more interesting items.

IMVU to Google

I got laid off at IMVU in September 2017. It was a difficult time for me as I had become good friends with the people there. After almost 2 months of searching for jobs, I somehow managed to land a job at Google! My title is Release Engineer. I’ve been there almost a year now and I am still happy and excited to work there.

Golang

I started learning Golang a few months ago, because, I felt that this is the best time to learn it (while I’m employed by Google). I haven’t really done any of the advanced things yet, but I like how the language tries really hard to keep the syntax simple. It’s a lot like C in that regard.

The only pain point in Go for me is the packaging/installation system. The whole opintionated $GOPATH thing just feels a bit clunky because of the shared folder namespace with other projects. But I guess that’s unavoidable in any language’s ecosystem.

Git Book

I started writing a short (informal) book on Git. I am using LuaTeX to write it; I started in March 2018 but have yet to cross the 1/2 way mark. Hopefully I’ll get it done before March 2019 rolls around.

Haskell book

Back in 2016’s status update I said that I still planned to finish the Haskell book I was working on. That project is definitely dead. One reason is that due to the rising popularity of the language, I feel that other people have already said what I had meant to say in my book.

Shen, Rust, Erlang, Idris, Factor

I’ve grown interested in these languages because well, I feel like they are important. My hope is to find some interesting problems that can be solved idiomatically in each language. That might take years, but, it is my hope that in the future I’ll be able to write about these languages.

HTTPS for this site

Apparently HTTPS support for custom domains on Github have been a thing since earlier this year. I never got around to it but thanks to this post I finally enabled it.

Useful Manpages

2017-11-11T00:00:00Z

2017-11-11
linux, git

A while ago I discovered that there is a manpage for the ASCII character set. It got a bunch of upvotes, and since then I wondered what other manpages were worth knowing about. Below is a small table of manpages that I found interesting.

Manpage	Description
`ascii(7)`	the ASCII character set (in octal, decimal, and hex)
`units(7)`	megabytes vs mebibytes, etc.
`hier(7)`	traditional filesystem hierarchy (e.g., `/bin` vs `/usr/bin`)
`file-hierarchy(7)`	(systemd) filesystem hierarchy
`operator(7)`	C operator precedence rules (listed in descending order)
`console_codes(4)`	Linux console escape and control sequences
`terminal-colors.d(5)`	among other things, ANSI color sequences
`boot(7)`	UNIX System V Release 4 bootup process
`daemon(7)`	(systemd) how to write/package daemons
`proc(5)`	proc filesystem (`/proc`)
`ip(7)`	Linux IPv4 protocol implementation (a bit low-level, but still useful)
`ipv6(7)`	Linux IPv6 protocol implementation
`socket(7)`	Linux socket interface
`unix(7)`	UNIX domain sockets
`fifo(7)`	named pipes

Note that you need to run

sudo mandb

to be able to invoke apropos or man -k (man -k is equivalent to apropos — see man(1)).

Git-specific

You probably knew already that Git has many manpages dedicated to each of its subcommands, such as git-clone(1) or git-commit(1), but did you know that it also comes with a suite of tutorials? Behold!

Manpage	Description
`giteveryday(7)`	the top ~20 useful git commands you should know
`gitglossary(7)`	a glossary of all git concepts (blob object, working tree, etc.)
`gittutorial(7)`	a high-level view of using git
`gittutorial-2(7)`	explains the object database and index file (git architecture internals)
`gitcore-tutorial(7)`	like `gittutorial-2(7)`, but much more detailed
`gitworkflows(7)`	recommended workflows, esp. branching strategies for maintainers

Happy hacking!