Codex

1. Introduction

This project aims to capture my notes for studying various programming problems (data structures and algorithms). Have a look at the problems in Section 2.

For those curious about how these notes were created, check out Section 4.

As for why this project is named Codex, I just think it's a cool word that has the word "code" in it as a substring.

1.1. How to read this document

Please do not use the GitHub-rendered view of this file, as many things like links and citations simply do not work. Instead go to https://funloop.org/codex for the best experience, on a desktop or laptop screen (mobile devices don't work very well, not to mention the missing Table of Contents sidebar).

2. Problems

Every problem gets its own README.org file in its own subfolder. The solutions are all in Python. All solutions are "standalone" in that none of them use any libraries other than what's provided by Python's standard libraries.

The problems are drawn mainly from [1]. Other reference materials are cited where applicable. Below is a table of every problem, with tags that give a brief description of each one, and references.

Name Tags References
Parity bitwise [1, p. 27], [2, p. 96]
Rectangle overlap geometry [1, p. 39]
Rearrange list array, partitioning [1, p. 41]
Dutch national flag array, partitioning [1, p. 43], [3, p. 200], [4, pp. 296–301], [5, p. 123], [6, p. 186], [7, pp. 125–126]
Buy and sell stock once array [1, p. 51], [6, p. 69]
Buy and sell stock twice array [1, p. 53]
Interconvert strings and integers array, string [1, p. 75]
Merge sorted linked lists linked list [1, p. 92]
Height-balanced binary trees binary tree [1, p. 124]
Merge monotonically increasing streams priority queue [1, p. 144]
Find first occurrence from sorted array binary search [1, p. 155]
Anonymous message from magazine hash table [1, p. 175]
Intersection of two sorted arrays sorting [1, p. 194]
Validate binary tree as BST binary tree [1, p. 213]
Tower of Hanoi recursion, binary tree [1, p. 233]
Maximum Subarray divide and conquer, dynamic programming [1, p. 250], [5, p. 77]
Making Change dynamic programming [1, p. 253], [8, Sec. 1.2.9]]

2.1. Data structures

Some problems can only be solved in an elegant way if we use particular data structures. See below for introductory discussions about some of these.

Name Tags References
Linked list linked list [1, p. 91]
Stack (with "max" method) stack [1, p. 106]
Binary tree tree [1, p. 123]
Binary search tree tree [4, p. 396]
Heap tree, heap, priority queue [4, p. 308]

2.2. Appendix

Python tricks
There are some Python-language-specific tricks available for programming problems. You might want to skim over this if your Python skills are rusty.
Mathematics
Some (some would argue all) topics in programming have mathematical underpinnings.

3. Tests

Dependency Why
Ruff for linting
Mypy for enforcing type hints
Hypothesis for property-based tests

All solutions to the problems are implemented in Python, and tested with basic unit tests and the Hyothesis property-based testing framework. Each problem's discussion comes with its own test suite. All source code samples are linted as well with ruff and mypy. Testing has been extremely valuable in checking the correctness of the puzzle solutions collected in this work.

4. Literate Programming Build System

We use Lilac for literate programming. To build everything just run make shell to drop inside the development environment shell then run make inside it.

4.1. Weaving (generating the docs)

🔗
PROJ_ROOT := $(shell git rev-parse --show-toplevel)
LILAC_ROOT := $(PROJ_ROOT)/deps/elisp/lilac
PROCS := $(shell nproc)
define run_emacs
    LILAC_ROOT=$(LILAC_ROOT) emacs $(2) --quick --batch --kill \
        --load $(LILAC_ROOT)/lilac.el \
        --load $(PROJ_ROOT)/codex.el \
        --eval="$(1)"
endef
define run_emacs_nobatch
    LILAC_ROOT=$(LILAC_ROOT) emacs $(2) --quick --kill \
        --load $(LILAC_ROOT)/lilac.el \
        --load $(PROJ_ROOT)/codex.el \
        --eval="$(1)"
endef
src_problem = $(shell find problem/ -type f -name 'README.org')
woven_html = \
    $(patsubst \
        problem/%/README.org, \
        problem/%/README.html, \
        $(src_problem)) \
    appendix/mathematics/README.html \
    appendix/python_tricks/README.html \
    README.html
problem_dirs = $(shell find problem -maxdepth 1 -type d | sort | tail -n+2)
problem_dirs_without_prefix = $(subst problem/,,$(problem_dirs))
test_dirs = $(shell find . -type f -name '__init__.py' \
        | sed 's|/__init__.py||')
# All problem dirs with Tikz-generated images.
image_dirs = $(shell find . -type f -name 'images.org' \
        | sed -e 's|/[^/]*$$||' -e 's|^problem/||' | sort -u)
all_tests_verified = $(patsubst %.py, %.py.verified, \
        $(shell find . -type f -name 'test.py'))

define weave_org

$(1): $(2) build-literate.org \
        $(shell find $(3) -type f -name 'images.org' \
            | xargs grep 'img.pdf' \
            | sed \
                -e 's|images.org:#+header: :file ||' \
                -e 's|.img.pdf|.svg|')
    @echo weaving $(2)
    $(call run_emacs,(lilac-publish),$(2))

endef

all: check.verified weave
.PHONY: all

Makefile-tangle

weave: $(woven_html)
    touch weave

README.html: build-literate.org README.org citations.bib
    $(call run_emacs,(lilac-publish),README.org)

$(foreach d,$(problem_dirs_without_prefix),\
    $(eval $(call weave_org,\
    problem/$(d)/README.html,\
    problem/$(d)/README.org,problem)))

appendix/mathematics/README.html: appendix/mathematics/README.org \
        appendix/mathematics/twos-complement.org
    $(call run_emacs,(lilac-publish),appendix/mathematics/README.org)

appendix/python_tricks/README.html: appendix/python_tricks/README.org
    $(call run_emacs,(lilac-publish),appendix/python_tricks/README.org)

check.verified: lint.verified test
    touch check.verified

test: $(all_tests_verified)
    touch test

lint.verified: \
        mypy.verified \
        ruff.verified \
        spellcheck.verified \
        linelength.verified \
        linkcheck.verified
    touch lint.verified

mypy.verified: $(all_tests_verified)
    mypy problem appendix
    touch mypy.verified

ruff.verified: $(all_tests_verified)
    ruff problem appendix
    touch ruff.verified

Makefile-spellcheck

Makefile-linelength

Makefile-linkcheck

# Enter development environment.
shell:
    nix-shell --pure

Makefile-update-deps

4.1.1. Use lilac.theme file

First we track Lilac as a submodule within this project, and then symlink to the CSS, JS, and theme files to the project root. Symlinking adds a level of indirection such that if we ever decide to move around the submodule location to somewhere else, we won't have to update all of our Org files and can instead just update these symlinks.

Then in all of our published Org files, we do

#+SETUPFILE: path/to/lilac.theme

to get the CSS/JS that comes with Lilac.

4.1.2. Custom CSS and HTML <head> content

We tweak Lilac's default CSS a bit.

Make all HTML files we generate try to pull in a local file called codex.css, as well as a custom font.

; See https://stackoverflow.com/a/27285582/437583.
(setq lilac-html-head
      (concat
       "<link rel=\"stylesheet\" type=\"text/css\" href=\"codex.css\" />\n"
       "<link rel=\"stylesheet\" href="
        "\"https://fonts.googleapis.com/css2"
        "?family=Bungee+Shade:wght@400"
       "\">"
       ))
4.1.2.1. Main CSS file

The default codex.css file has some miscellaneous customizations.

4.1.2.1.1. Title font

This makes the title font bigger and uses a more ornate font for it.

h1.title {
    font-family: "Bungee Shade", serif;
    font-size: 100pt;
}
@media (any-pointer: coarse) {
    h1.title {
        font-family: "Bungee Shade", serif;
        font-size: 60pt;
    }
}
4.1.2.1.2. Tables
table.monospace-except-header {
    font-family: monospace;
}

table.monospace-except-header thead {
    font-family: var(--font-serif);
}
4.1.2.2. Secondary CSS file

This is a second CSS file where we customize the title text of the pages for the problems. We just (manually) create a symlink from the problem page's codex.css to this one (because we don't want to use the regular codex.css from above).

🔗
h1.title {
    font-family: "Source Serif Pro", serif;
}
codex-css-tables

4.1.3. Ignore woven HTML from git diff

Typically we only need to look at the rendered HTML output in a web browser as the raw HTML diff output is extremely difficult to parse as a human. So by default we ask Git to exclude it from git diff by treating them as binary data.

🔗
* -diff
**/*.org diff
**/.gitattributes diff
**/.gitmodules diff
**/.gitignore diff
package/nix/sources.json diff
COPYRIGHT diff
LICENSE diff

In order to still show the HTML textual diff, we can run git diff --text.

4.1.3.1. git add -p

Note that the above setting to treat HTML files as binary data prevents them from being considered for git add -p. In order to add them, use git add -u instead.

4.1.4. gitignore

🔗
**/__pycache__
**/*.auctex-auto
**/*.hypothesis
**/*.pdf
**/*.log
**/*.py.verified
check.verified
lint.verified
linkcheck.verified
linelength.verified
mypy.verified
ruff.verified
spellcheck.verified
test
update-deps
weave

4.2. Tangling (generating the source code)

Tangling is simply the act of collecting the #+begin_src ... #+end_src blocks and arranging them into the various target (source code) files. Every source code block is given a unique name.

We simply tangle all major *.org files in the toplevel Makefile.

build_literate_tangled = \
        .gitattributes \
        .gitignore \
        _typos.toml \
        codex.css \
        codex.problem.css \
        codex.el \
        Makefile \
        shell.nix
$(build_literate_tangled) &: build-literate.org
    $(call run_emacs,(org-babel-tangle),build-literate.org)
    touch $(build_literate_tangled)

citations.bib: README.org
    $(call run_emacs,(org-babel-tangle),README.org)
    touch citations.bib

define generate_img_pdfs
$(shell grep 'img.pdf' $(1) \
        | sed \
            -e 's|^|$(1):|' \
            -e 's|images.org:#+header: :file ||') &: $(1)
    @echo generating images from $(1)
    $(call run_emacs,(org-html-export-to-html),$(1))
    rm -f $(patsubst %/images.org, %/images.html, $(1))
endef

define generate_img_svgs
$(1:.img.pdf=.svg): $(1)
    @echo generating svg from $(1)
    @echo $(shell pwd)
    pdf2svg $(1) $(1).uncropped.svg
    inkscape \
        --export-plain-svg \
        --export-margin=5 \
        --export-filename=$(1:.img.pdf=.svg) \
        --export-area-drawing \
        $(1).uncropped.svg
    rm $(1).uncropped.svg
endef

$(foreach p,$(image_dirs),\
    $(eval $(call generate_img_pdfs,\
    $(p)/images.org)))

all_img_pdfs = $(shell find . -type f -name 'images.org' \
        | xargs grep 'img.pdf' \
        | sed 's|images.org:#+header: :file ||')
$(foreach img,$(all_img_pdfs),\
    $(eval $(call generate_img_svgs,\
    $(img))))

define tangle_tests
$(1)/__init__.py $(1)/test.py &: $(1)/README.org
    @echo tangling $(1)/README.org
    $(call run_emacs,(org-babel-tangle),$(1)/README.org)
    find $(1) -type f -name '*.py' \
        -execdir sed -i 's/[[:blank:]]*$$$$//' {} +
endef

# See https://stackoverflow.com/a/9694782/437583.
$(foreach d,$(test_dirs),\
    $(eval $(call tangle_tests,$(d))))

define verify_tests
$(1)/test.py.verified: $(1)/test.py
    python -m unittest discover \
        --failfast --start-directory $(1) \
        --top $(shell echo $(1) | sed -e 's|./||' -e 's|/.\+||')
    touch $(1)/test.py.verified
endef

$(foreach d,$(test_dirs),\
    $(eval $(call verify_tests,$(d))))

4.3. Linting

4.3.1. Spell checker

We use typos-cli to check for spelling errors. Below we configure it to only check the original source material — Org files.

🔗
[files]
extend-exclude = [
    "*.html",
    "deps/*",
]

Here we have the Makefile rules for linting, which include this spellchecker.

ORG_FILES = $(shell find . -type f -name '*.org')

spellcheck.verified: $(ORG_FILES)
    typos
    touch spellcheck.verified

4.3.2. Detect long lines

For code we tangle, we want lines to be roughly 80 characters. This limit is actually a bit difficult to enforce because sometimes the source code blocks we edit get placed into an indented location, and from the source code block itself we cannot tell how much this indentation is exactly. So set the maximum line length for tangled text to be 90 characters.

We have to wrap the find ... invocation with || true because xargs will exit with 123 if the last grep call can't find a match. That is, in our case not finding a match is a good thing but the inner grep doesn't know that.

This code detects which files to look at by looking at lines in Org files that start with #+begin_src and end with :tangle foo, where foo is the last word in the line.

linelength.verified: $(ORG_FILES)
    `find . -type f -name '*.org' \
        | grep -v '^./deps' \
        | xargs grep '^#+begin_src.\+ :tangle ' \
        | sed 's,[^/]\+.org.\+:tangle ,,' \
        | grep -v citations.bib \
        | xargs grep -n '^.\{90\}' > linelength_offenders.log` || true
    test `wc --bytes linelength_offenders.log | cut -d\  -f1` -eq 0
    touch linelength.verified

4.4. Development environment (Nix shell)

This is taken from https://github.com/tweag/haskell-stack-nix-example/blob/b9383e35416a2b0e21fbc97ed079538f9f395b6a/shell.nix#L1.

This is the main development shell and brings in all of our dependencies to build all of our code. It's great for development and testing things out (such as running "make" to re-run any Python tests that have been updated when adding new problems).

🔗
let
  # Nixpkgs snapshot.
  sources = import ./package/nix/sources.nix;
  # The final "pkgs" attribute with all the bells and whistles of our overlays.
  pkgs = import sources.nixpkgs {};
  # This minimalist latex setup is adapted from https://nixos.wiki/wiki/TexLive.
  tex_for_orgmode = (pkgs.texlive.combine {
    # Start with scheme-basic.
    inherit (pkgs.texlive) scheme-basic
      # Add in additional TeX packages (think CTAN package names).
      wrapfig amsmath ulem hyperref capt-of

      # TikZ.
      pgf
      xkeyval
      fontspec
      tikz-qtree

      # Source Sans Pro font.
      sourcesanspro
      # Source Code Pro font.
      sourcecodepro
      ;
  });

in

# This is our development shell.
pkgs.mkShell ({
  buildInputs = [
    # Tangling and weaving for Literate Programming.
    pkgs.emacs29-nox

    # Diagrams.
    pkgs.inkscape
    pkgs.pdf2svg
    tex_for_orgmode

    # Misc
    pkgs.git
    pkgs.less

    # Update deps (bootstrap).
    pkgs.niv
    pkgs.nix
    pkgs.cacert

    # Spell checking.
    pkgs.typos

    # Link checker.
    pkgs.lychee

    # Python testing and linting.
    pkgs.python3Packages.hypothesis
    pkgs.python3Packages.mypy
    pkgs.ruff
  ];
})

4.4.1. Update Nix dependencies

This is based on Lilac's own code for updating Nix dependencies with niv.

nixpkgs_stable_channel := nixos-23.11
update-deps: package/nix/sources.json package/nix/sources.nix
    cd package && niv update nixpkgs --branch $(nixpkgs_stable_channel)
    cd package && niv update
    touch update-deps

4.5. Elisp

🔗
(setq org-cite-csl-styles-dir
      (concat (getenv "LILAC_ROOT") "/deps/styles/"))

(setq org-latex-pdf-process
  '("lualatex --shell-escape --interaction nonstopmode --output-directory=%o %f"))

codex-html-head

5. References

[1]
A. Aziz, T.-H. Lee, and A. Prakash, Elements of Programming Interviews in Python: The Insiders’ Guide. CreateSpace Independent Publishing Platform (25 July. 2018), 2018.
[2]
H. S. Warren, Hacker’s Delight, 2nd ed. Upper Saddle River, NJ: Addison-Wesley, 2013.
[3]
A. Levitin and M. Levitin, Algorithmic Puzzles. Oxford ; New York: Oxford University Press, 2011.
[4]
R. Sedgewick and K. D. Wayne, Algorithms, 4th ed. Upper Saddle River, NJ: Addison-Wesley, 2011.
[5]
J. L. Bentley, Programming Pearls, 2nd ed. Reading, Mass: Addison-Wesley, 2000.
[6]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 3rd ed. Cambridge, Mass: MIT Press, 2009.
[7]
S. S. Skiena, The Algorithm Design Manual, 2nd ed. London: Springer, 2008.
[8]
D. E. Knuth, The Art of Computer Programming: Fundamental Algorithms, 3rd ed. Reading, Mass: Addison-Wesley, 1997.