All source code is generated by tangling this Org file
(developer-guide.org). This file is the single source of truth for basically
everything. Some other things like COPYRIGHT and LICENSE do not come from this
file, but they are exceptions.
Tangling is done by loading this file into Emacs and then running
(org-babel-tangle). This file is also part of the woven HTML documentation
as developer-guide.html, which is referenced by the introductory docs at
README.html. The HTML is generated by invoking (lilac-publish). The outputs
of both tangling and weaving are checked into version control.
The Makefile in this repo is used as the main "driver" for both tangling and
weaving. Typically, you would have a browser pointed to README.html or
developer-guide.html (whichever one you are working on) and refresh it after
editing the corresponding Org file. After every change to the Org file, you can
run make to tangle, weave, and run unit tests.
1. Development environment (Nix shell)
This is the main development shell and brings in all of our dependencies to
build all of our code. Taken from here. The Makefile is meant to be executed
from within this environment.
let# Nixpkgs snapshot.sources = import./package/nix/sources.nix;
# The final "pkgs" attribute.pkgs = import sources.nixpkgs {};
in# This is our development shell.
pkgs.mkShell ({
buildInputs = [
# Tangling and weaving for Literate Programming.
pkgs.emacs29-nox
# For evaluation of Python source code blocks.
pkgs.python3Minimal
# Spell checking.
pkgs.typos
# Update Nix dependencies in package/nix/sources.nix.
pkgs.niv
# Misc.
pkgs.git
pkgs.less
];
})
For Emacs, we use the -nox version to avoid GUI dependencies (because we
always invoke Emacs in batch mode in the terminal without ever using it in an
interactive manner).
2. Makefile
We have a top-level Makefile so that we can run some make commands on the
command line. The overall idea is to tangle and weave, while also running any
associated tests.
Note that we make use of the fake file tangle, so that we can write the
top-level test rule as test: tangle, which reads more naturally than the
equivalent test: Makefile or test: lilac.el.
Weaving just depends on the main README.html and developer-guide files being
generated. Before we call (lilac-publish), we have to first call
(lilac-gen-css-and-exit) because otherwise the source code blocks do not get
any syntax highlighting.
Tangling is pretty straightforward — we just need to call
(org-babel-tangle) on developer-guide.org (the README.org does not contain
any code we need to run to make this work). This generates a number of files,
such as the Makefile and shell.nix.
The key here is to enumerate these generated files, because we need to tell the
make utility that it should run the rule if developer-guide.org has a newer
modification timestamp than any of the generated files. Technically speaking,
because all of the tangled files are tangled together at once with
(org-babel-tangle), we could just list one of them such as Makefile (instead
of enumerating all of them). However we still enumerate them all here for
completeness.
# tangled_output are all files that are generated by tangling developer-guide.org.tangled_output = \
citations-developer-guide.bib \
lilac.css \
lilac.el \
lilac-tests.el \
lilac.js \
lilac.theme \
.gitattributes \
.gitignore \
Makefile \
shell.nix
tangle $(tangled_output) &: developer-guide.org
#Generate the toplevel Makefile (this file) and others as described in #tangled_output. In a way this bootstraps the whole literate-programming #pipeline.
$(call run_emacs,(lilac-tangle),developer-guide.org)
touch tangle
The run_emacs function is used for both weaving and tangling. The main thing
of interest here is that it loads the lilac.el (tangled) file before
evaluating the given expression.
We use niv to update the dependencies sourced by shell.nix. Niv uses two
sources of truth: the niv repository itself on GitHub, and a branch of
nixpkgs. The former tracks the master branch, and the latter tracks the
stable channels (example: nixos-24.05 branch). Whenever we run niv update,
niv will update the HEAD commit SHA of these branches.
One problem with the nixpkgs stable channel is that it will eventually become
obsolete as newer stable channels get created. So we have to manually track
these channels ourselves.
Tangling is simply the act of collecting the #+begin_src ... #+end_src blocks
and arranging them into the various target (source code) files. Every source
code block is given a unique name.
We simply tangle the developer-guide.org file to get all the code we need.
3.1. Standardized Noweb reference syntax
Lilac places the following requirements for naming Noweb references:
the reference must start with a __NREF__ prefix
the reference must then start with a letter, followed by letters, numbers,
dashes, underscores, or periods, and may terminate with (...) where the
"…" may be an empty string or some other argument.
These rules help Lilac detect Noweb references easily.
(defunlilac-nref-rx (match-optional-params)
(rx-to-string
(lilac-nref-rx-primitive match-optional-params)))
(defunlilac-nref-rx-primitive (match-optional-params)
(if match-optional-params
`(group
"__NREF__";; Noweb reference must start with a letter...
(any alpha)
;; ...and must be followed by;; letters,numbers,dashes,underscores,periods...
(* (or (any alnum) "-""_""."))
;; ...and may terminate with a "(...)" where the "..." may be an empty;; string, or some other argument.
(* (or"()"
(and"("
(* (not ")"))
")"))))
`(group
"__NREF__"
(any alpha)
(* (or (any alnum) "-""_"".")))))
3.2. Remove trailing whitespace
When the __NREF__... is indented, and that reference has blank lines, those
blank lines will inherit the indent in the parent block. This results in
trailing whitespace in the tangled output, which is messy. Delete such lines
from the tangled output.
Some code blocks are generated by evaluating code in other blocks. This way, you
can use all the power of Org mode (as well as any supported programming
language) as a kind of meta-programming system.
Orgmode by default disables automatic evaluation of source code blocks, because
it is a big security risk. For us, we know that we want to allow code
evaluation, so we disable the evaluation confirmation check. This way,
evaluation can still work even in batch mode.
The following is needed to evaluate the example for source code block evaluation
in Monoblock (evaluation result's value), which evaluates Python. We don't
need to add in Emacs lisp here because it's already supported by default (which
we take advantage of in 4.11).
In addition to Python, we add in some additional languages that might be of use.
Weaving is conceptually simpler than tangling because there is no extra step —
the output is an HTML page and that is something that we can use directly
(unlike program source code, which may require additional compilation into a
binary, depending on the language). We limit ourselves to HTML output because
HTML support is ubiquitous; plus we don't have to worry about page breaks such
as in PDF output.
Although weaving is conceptually simple, most of the code in lilac.el have to
do with weaving because the default infrastructure that ships with Org mode is
too rigid for our needs. For example, we make heavy use of Noweb-style
[1] references, but also add in extensive HTML links to allow
the reader to jump around code easily because Org does not cross-link these
references by default.
Weaving currently requires the following dependencies:
Note that all of the above can be brought in by using the Nix package manager.
This is why we provide a shell.nixfile in this repo.
4.1. Emacs customizations (lilac.el)
Below is the overall structure of lilac.el. The gc-cons-threshold
setting is to prevent emacs from entering garbage collection pauses, because
we invoke emacs from the command line in a non-interactive manner.
Nondeterminism is problematic because it results in a different HTML file
every time we run org-babel-tangle, even if the Org files have not changed.
Here we take care to set things right so that we can have reproducible, stable
HTML output.
4.2.1. Do not insert current time as HTML comment
Org mode also injects an HTML comment (not visible to the user) to record the
time that the HTML was generated. We disable this because it breaks
deterministic output. See this link for more info.
By default Org mode appends visible metadata at the bottom of the HTML document,
including the Org version used to generate the document. We suppress this
information.
(defunorg-export-deterministic-reference (references)
(let ((new (length references)))
(while (rassq new references) (setq new (1+ new)))
new))
(advice-add #'org-export-new-reference
:override #'org-export-deterministic-reference)
4.3. Top-level publishing function (lilac-publish)
The top-level function is lilac-publish. This actually publishes to HTML
twice, with two separate calls to org-html-export-to-html. The reason we
publish twice is because we need to examine the HTML output twice in order to
build up a database of parent/child source code block links (which is then used
to link between these parent/child source code blocks).
Also note that we do some modifications to the Org buffer directly before
exporting to HTML. The main reason is so that the source code blocks that are
named __NREF__... get an automatic #+caption: ... text to go along with it
(because for these Noweb-style blocks, the captions should always look uniform).
Here we modify the Org mode buffer, by using org-export-before-parsing-hook.
This takes a list of functions that are free to modify the Org mode buffer
before each Org element in the buffer gets converted into HTML.
In brief, the lilac-UID-for-all-* functions make it so that the links to
headlines and source code blocks are both deterministic and human-readable. The
lilac-insert-noweb-source-code-block-captions function
Now we start modifying the HTML.
This is useful for adding in final tweaks to the HTML that is difficult to
accomplish at the Org-mode buffer level.
Phase 1: In the first phase, we use the generated HTML data to populate the
lilac-child-HTML_ID-hash-table. This data structure is used to link to child
blocks from parent blocks. We also populate the
lilac-org_id-human_id-hash-table which is used to convert HTML IDs to be more
human-readable.
because of some internal housekeeping we have to do.
Phase 2: In this phase we perform the linking from parent blocks to child blocks
(lilac-link-to-children-from-parent-body), and also convert the child source
code captions to look prettier (lilac-prettify-source-code-captions).
After publishing to HTML, we have to clear the intermediate hash tables used for
the export, because we could be invoking lilac-publish multiple times from the
same emacs session (such as during unit tests).
4.4.1. Give all source code blocks a #+name: ... field (HTML ID)
Only source code blocks that have a #+name: ... field (org name field) get an
HTML ID (org ID) assigned to it. The problem with polyblocks is that they are
not assigned an org name field by default.
Of course, we still want all polyblock to have an HTML ID, which can then be
extracted by lilac-get-src-block-HTML_ID to build up the
lilac-child-HTML_ID-hash-table in 4.5.3. If we don't do this then parent source code blocks won't
be able to link to the polyblock at all (or vice versa).
Monoblocks with a #+name: ... field get a unique HTML ID assigned to it in
the form orgN where N is a hexadecimal number. By default Org generates a
random number for N, but we use a simple counter that increments, starting
from 0 (see 4.2.3).
Some source code blocks may not even be monoblocks, because a #+name: ...
field may simply be missing.
What we can do is inject a #+name: ___anonymous-src-block-N line (where N is
an incrementing number) into the beginning of the source code section of all
source code blocks that need it. Then we can construct an HTML link to any
source code block.
Note that the actual name __anonymous-src-block-N is not important, because it
gets erased and replaced with an orgN ID during HTML export. At that point we
make these orgN strings human-readable in 4.5.1.
4.4.2. Automatic captions for Noweb source code blocks
For the parent/child source code blocks, we simply build these up by having
blocks named #+name: __NREF__foo or #+header: :noweb-ref __NREF__foo. Each
of these blocks can also reference other blocks by having a line __NREF__bar
inside its body. When defining such blocks, we really don't want to define the
#+caption: ... part manually because it gets tedious rather quickly. Yet we
still have to have these #+caption: ... bits (for every __NREF__...
block!) because that's the only way that Org's HTML exporter knows how to label
these blocks.
The code in this section automatically generates #+caption: ... text for these
__NREF__... blocks.
We want each #+caption: ... text to have the following items:
a link back up to a parent block (if any) where
this block is used; can contain more than 1 parent if multiple parents refer
to this same child block
NSCB here means Noweb source code block. We loop through every source code
block and insert a #+caption: ... text into the buffer. This modified buffer
(with the three bits of information from above) is what is sent down the
pipeline for final export to HTML (i.e., the buffer modification does not affect
the actual buffer (*.org file)).
So assume that we already have the smart captions in a sorted association list
(aka alist), where the KEY is the integer buffer position where this caption
should be inserted, and the VALUE is the caption itself (a string), like this:
We can use the KEY to go to that buffer position and insert the caption. However
the insertion operation mutates the buffer. This means if we perform the
insertions top-to-bottom, the subsequent KEY values will become obsolete. The
trick then is to just do the insertions in reverse order (bottom-to-top), so
that the remaining KEY values remain valid. This is what we do below, where
smart-captions is an alist like the one just described.
(We'll get to the helper functions smart-source-code-block-captions-helpers
later as they obscure the big picture.)
Now we just have to construct smart-captions. The main difficulty is the
construction of NSCB_LINKS_TO_PARENTS, so most of the code will be concerned
about child-parent associations.
Why do we even need these source code blocks to link back to their parents? The
point is to make things easier to navigate. For example, if we have
#+name: parent-block#+begin_src bashecho"Hello from the parent block"<<__NREF__child-block-1>><<__NREF__child-block-2>>#+end_src
...
#+name: __NREF__child-block-1#+begin_src bashecho"I am child 1"#+end_src
...
#+header: :noweb-ref __NREF__child-block-2#+begin_src bashecho -n "I am "#+end_src#+header: :noweb-ref __NREF__child-block-2#+begin_src bashecho"child 2"#+end_src
and we export this to HTML, ideally we would want both __NREF__child-block-1
and each of the __NREF__child-block-2 blocks to include an HTML link back up
to parent-block. This would make it easier to skim the document and not get
too lost (any time you are looking at any particular source code block, you
would be able to just click on the link back to the parent (if there is one) to
see a higher-level view).
The key idea here is to build a hash table (child-parents-hash-table) where
the KEY is a child source code block and the VALUE is the parent block(s). Then
in order to construct NSCB_LINKS_TO_PARENTS we just do a lookup against this
hash table to find the parent(s), if any.
The first thing we need is a list of parent source code blocks. We consider a
source code block a parent block if it has any Noweb references within its body.
Then we construct the child-parents-hash-table. For each parent block, we get
all of its children (child-names), and use this data to construct a
child-parent association. Note that we use cl-pushnew instead of push to
deduplicate parents (i.e., when a single parent refers to the same child more
than once we do not want to link back to this same parent more than once from
the child block's caption).
lilac-mk-child-parents-hash-table takes all parent source code blocks and
generates a hash table where the KEY is the child block name and the VALUE is
the list of parents that refer to the child. When we loop through
parent-blocks below, we have to first reverse it because the function
cl-pushnew grows the list by prepending to it.
lilac-mk-smart-captions generates an alist of buffer positions (positive
integer) and the literal #+caption: ... text that needs to be inserted back
into the buffer.
(defunlilac-insert-strings-into-buffer (pos-strings)
(cl-loop for pos-string in (reverse pos-strings) do
(let ((pos (car pos-string))
(str (cdr pos-string)))
(goto-char pos)
(insert str))))
lilac-get-noweb-children extracts all Noweb references in the form
"__NREF__foo" from a given multiline string, returning a list of all such
references. This function expects at most 1 Noweb reference per line. The return
type is a list of strings.
Note that a child source block can have two ways of defining its name. The first
is with the direct #+name: __NREF__foo style (monoblock), and the second way
is with a line like #+header: :noweb-ref __NREF__foo (polyblock). Here
lilac-get-src-block-name grabs the name of a (child) source code block, taking
into account these two styles. For polyblock names, we mark it as such with a
(polyblock) string, which is used later for the NSCB_POLYBLOCK_INDICATOR.
(defunlilac-enumerate (lst &optional start)
(let ((ret ()))
(cl-loop for index from (if start start 0)
for item in lst
do (push (list index item) ret))
(reverse ret)))
; See https://emacs.stackexchange.com/a/7150.
(defunlilac-matches (regexp s &optional group)
"Get a list of all regexp matches in a string"
(if (= (length s) 0)
()
(save-match-data
(let ((pos 0)
(matches ()))
(while (string-match regexp s pos)
(push (match-string (if group group 0) s) matches)
(setq pos (match-end 0)))
(reverse matches)))))
By default Org does a terrible job of naming HTML id fields for headings. By
default it uses a randomly-generated number. In 4.2.3 we tweak this behavior to use a deterministic,
incrementing number starting from 0. However while this solution gets rid of the
nondeterminism, it still results in human-unfriendly id attributes because
they are all numeric (e.g. org00000a1, org00000f3, etc).
For headings, we can do better because in practice they already mostly have
unique contents, which should work most of the time to act as an id. In other
words, we want all headings to have HTML IDs that are patterned after their
contents. This way we can have IDs like some-heading-name-1 (where the
trailing -1 is only used to disambiguate against another heading of the same
name) instead of org00000a1 (numeric hex).
For each heading, we insert a CUSTOM_ID property. This makes Org refer to this
CUSTOM_ID instead of the numeric org... link names. We append this headline
property just below every headline we find in the buffer. The actual
construction of the CUSTOM_ID (headline-UID in the code below) is done by
lilac-get-unique-id.
(defunlilac-UID-for-all-headlines (_backend)
(let* ((all-headlines
(org-element-map (org-element-parse-buffer) 'headline 'identity))
(headline-uid-hash-table (make-hash-table :test 'equal))
(headline-UIDs
(-remove 'null
(cl-loop for headline in all-headlines collect
(let* ((headline-UID
(lilac-get-unique-id headline headline-uid-hash-table))
;; Get the position just after the headline (just;; underneath it).
(pos (progn
(goto-char (org-element-property :begin headline))
(re-search-forward "\n"))))
(cons pos (concat
":PROPERTIES:\n"":CUSTOM_ID: " headline-UID "\n"":END:\n")))))))
(lilac-insert-strings-into-buffer headline-UIDs)))
<<get-unique-id>>
lilac-get-unique-id converts a given headline to its canonical form (every
non-word character converted to a dash) and performs a lookup against the hash
table. If the entry exists, it looks up a entry-N value in a loop with N
increasing until it sees that no such key exists (at which point we know that we
have a unique ID).
Polyblocks do get a name field attached to them during the Org
modification stage, in the format ___anonymous-src-block-N. These names are
for HTML link generation only, because the user won't see them — they will
instead just see org000012 or some such. In fact, all monoblocks are also
given these random-looking (and unstable) org... HTML IDs.
And therein lies the problem: if a user decides to bookmark a particular source
code block, whether a monoblock or polyblock, they will link to an
org...-style ID and chances are that this link will break over time.
This is exactly the same problem we have for headlines. For headlines we solved
the problem with a hash table, and we need to do the same thing here. The major
difference, though, is that unlike headlines which can accept a CUSTOM_ID Org
property, source code blocks have no such facility. So instead of modifying the
buffer (as we do for headlines), we have to modify the final HTML output
instead.
The solution is to simply look at all source code block links, then modify the
id=... part so that it looks like a more human-readable ID. We can extract the
human-readable ID by looking at the smart captions inside the
<label>...</label> area for both monoblocks and polyblocks. And then it's just
a matter of doing a basic search-and-replace across the entire buffer (HTML
file).
We have to do a search-and-replace across the entire file because we may also
have manual links to source code blocks (although — maybe it's just not worth
it because we can't refer to polyblocks anyway by name).
The default HTML export creates a <div> around the entire source code block.
This <div> will have a <pre> tag with the source code contents, along with a
preceding <label> if there was a #+caption: ... for this block. Because of
automatic generation of #+caption: ... bits for all Noweb-style references in
Section 4.4.2, the vast majority of
source code blocks will have this <label> tag.
By default the <label> tag includes a Listing number ("Listing 1: …",
"Listing 17: …", etc), because Org likes to numerically number every single
source code block. We simply drop these listing numbers and instead link back to
the parent block(s) that refer to this source code block (as a Noweb reference),
if any.
For polyblock chains, we also have to keep track of how long each chain is. This
way, each block in the chain will get a unique fraction denoting the position of
that block in the overall chain. (If we don't do this, then all of the captions
for all polyblocks in the same chain will look identical, which can be a bit
confusing). In order to generate these fractions we have to keep around a couple
hash tables for bookkeeping.
First here is the overall shape of the function. If there is no caption
(<label> tag in the HTML), then we return the original HTML unmodified.
Otherwise we prettify this label (removing the "Listing N: …" text, including
a link back to the parent, etc) along with the body text (linking references
to child Noweb references), and return both inside a new
lilac-pre-with-caption div.
Now let's get into the bindings. The first order of business is parsing the HTML
bits into separate parts.
div-caption-body is the original HTML but all on a single line. We need to
work with the text without newlines because Emacs Lisp's regular expressions
don't work well with newlines. That's why we call
lilac-get-source-block-html-parts-without-newlines.
We need leading-div because this outermost div contains various class and
other information that Org generates. We don't want to lose any of that info.
Now come the bits for identifying the human-readable source code block name by
looking at the caption (<label>). We extract it from the <code> tags that we
expect to see inside the <label> tag.
It may very well be the case that the block will not have a name, in which
case we just name it as anonymous. A source code block is anonymous if:
it does not have a "#+name: ..." line, or
it does not have a "#+header: :noweb-ref ..." line.
Because the name anonymous is meaningless (there can be more than one such
block), we need to disambiguate it. We do this by appending a numeric suffix to
all source code blocks with the same human-readable source-block-name.
Here we do some additional introspection into the <pre> tag which holds the
body text (the actual source code in the source code block). The pre-id is
important because it gives us a unique ID (linkable ID) to the body text. We'll
be using this when linking to a child block from a parent block.
Sadly, Org does not give every source code block an id=... field. Notably,
polyblocks do not get an id except for the very first block in the chain. And
so we inject an id for every <pre> tag ourselves. But first we have to see
if the <pre> tag has an ID already with pre-id-match.
For polyblocks though, we also want to show a fraction in the form (N/TOTAL)
where N is the numeric position of the polyblock (1 for the head, 2 for the
second one in the chain, 3 for the third, and so on), and TOTAL is the total
number of polyblocks in the chain. This way the reader can get some idea about
how many pieces there are as the overall chain is explained in the prose. This
(N/TOTAL) fraction is called a polyblock-indicator.
Note that we use the (polyblock) marker text from NSCB_POLYBLOCK_INDICATOR
to detect whether we're dealing with a polyblock, because otherwise all of those
anonymous blocks will get treated as part of a single polyblock chain.
Most source code blocks have a parent where this code block's contents should be
inserted into. There could be more than one parent if the code is reused
verbatim in multiple places.
We generate a link back to the parent (or parents), by extracting the links
(href bits in the <a> (aka anchor) tags) found in the caption. These links
were generated for us in Section 4.4.2; our job is to prettify them with CSS classes and such.
Every source code block gets a self-link back to itself (shown as a link icon
"🔗"). This goes at the very end of the caption on the far right (top right
corner of the source code block's rectangular area).
Finally, we're ready to recompose the overall source code block. We make a
distinction for source code blocks that have links back to a parent (or multiple
parents). In all cases we make sure to remove Org's default "Listing N:" prefix.
4.5.3. Link noweb references (link to child block from parent block)
In 4.5.2 we tweaked the HTML so that source code blocks
could link back up to their parents. In this section we are concerned with the
opposite — linking to child blocks from the parents. For example, consider the
following code:
#+name: parent-block#+begin_src bashecho"Hello from the parent block"<<__NREF__child-block-1>><<__NREF__child-block-2>>#+end_src
...
#+name: __NREF__child-block-1#+begin_src bashecho"I am child 1"#+end_src
...
#+header: :noweb-ref __NREF__child-block-2#+begin_src bashecho -n "I am "#+end_src#+header: :noweb-ref __NREF__child-block-2#+begin_src bashecho"child 2"#+end_src
What we want to do is to make the __NREF__child-block-1 and
__NREF__child-block-2 references inside parent-block to link to their
definitions, so that the reader can just click on them to go to see how they're
defined. Unfortunately Org mode doesn't do this by default so we have to do this
ourselves.
In the case of __NREF__child-block-2, it is defined in multiple blocks so we
would want to link to the very first block.
We cannot use a org-export-before-parsing-hook like we did in 4.3 because at that stage of processing, we
are dealing with Org mode syntax. Any modifications we make to the parent
source code block will be treated as text upon HTML export. Thankfully Org mode
allows customizations on generated HTML through the
org-export-filter-src-block-functions variable. This variable is analogous to
org-export-before-parsing-hook, but operates at the HTML level (not at the
Org syntax level) for source code blocks, which is exactly what we need.
So we have to craft valid HTML links (not Org links) to the child source code
blocks. For this we need the actual id part of the HTML <pre>... block that
will hold the source code. That is, the algorithm should be something like:
for every parent source code block,
for every child block (noweb) referenced in the body, insert an HTML link to
the child block (lookup in lilac-child-HTML_ID-hash-table).
The only thing remaining is the construction of
lilac-child-HTML_ID-hash-table. We can construct this by mapping through all
source code blocks and getting the name which can be just drawn from the <label
...> HTML tag, thanks to the smart captions we inserted for all child blocks
earlier in 4.4.2. This hash table
will hold mappings from child source block names to their HTML ID's.
Now that we have a high-level understanding, let's walk through the
implementation. First here's the code to populate the
lilac-child-HTML_ID-hash-table.
(setq lilac-child-HTML_ID-hash-table (make-hash-table :test 'equal))
(defunlilac-populate-child-HTML_ID-hash-table (src-block-html backend info)
(when (org-export-derived-backend-p backend 'html)
(let* ((child-name (lilac-get-src-block-name-from-html src-block-html))
(child-HTML_ID (lilac-get-src-block-HTML_ID src-block-html))
(child-HTML_ID-exists-already
(gethash child-name lilac-child-HTML_ID-hash-table nil)))
; Only process child blocks that have an HTML ID.
(if (and child-HTML_ID (not child-HTML_ID-exists-already))
(puthash child-name child-HTML_ID lilac-child-HTML_ID-hash-table))
; Return src-block-html as-is (no modifications).
src-block-html)))
(defunlilac-get-src-block-HTML_ID (src-block-html)
(let ((match (string-match "<pre [^>]+?id=\"\\([^\"]+\\)\">" src-block-html)))
(if match (match-string-no-properties 1 src-block-html))))
We need to get the source block name (the name of the child) from the HTML. We
do this in lilac-get-src-block-name-from-html below. Either there is a
__NREF__... text in the <code> tag within a <label> tag (from 4.4.2), or there is just a plain <label> as
a result of a manually-written #+caption: ... bit. We use either one. Note
that the latter category assumes that the user used unique caption labels for
all blocks; if the user manually creates non-unique captions, this will probably
break.
Now we have everything we need to add HTML links into the body of the source
code block directly. We search for the child name (which begins with a
__NREF__..., which is defined in lilac-nref-rx), and perform a string
replacement, adding in the link to the child block.
Org's HTML export hardcodes some things. We have to do some manual surgery to
set things right. First let's define a generic search-and-replace function. This
function is based on this example.
We have to include the trailing (and somewhat redundant) \"> so that the
function does not replace the text above as well as the intended raw HTML
(hardcoded) bits that we want to replace.
4.5.4.2. Bibliography (citations)
This cleans up the inline styles for citations. Again we include some additional
characters in the pattern (notably the left angle bracket (<)) so that the
text below itself does not get recognized and replaced.
This way we can target this surrounding parent div for the active HTML class
attribute (instead of the empty link anchor) to style the entire entry when we
click a link to go to it.
Similar to the bibliography entries, the description lists in HTML have empty
link anchors in them, because of the way we insert the link anchors manually in
the Org text (this is a convention we follow; see the 6 for examples).
We get rid of these anchors and instead create a surrounding div around it, so
that we can highlight the enclosed <dt> (description term) and <dd>
(description details).
If a user wants to use a custom Google web font, they have to make the HTML page
pull it in in the <head> part of the page. This requires modifying the HTML.
In order to facilitate this, we provide a replaceable piece of text that can be
swapped out for the value that the user can provide. Specifically, we inject the
line <!-- LILAC_HTML_HEAD --> into the HTML, and this can be replaced by the
value of the lilac-html-head variable in Emacs Lisp (which can be provided by
the user when they invoke the lilac-publish function).
Every HTML element h2 to h6 (which encode the Org mode headlines) already
come with a unique ID, but they are not linked to themselves. We add the
self-links here, which makes it easy for users to link to them directly when
they're reading the page.
Lastly, show a link anchor icon when the user hovers over a headline. We have
the icon present at all times; we only make it visible when we hover over the
heading. This way, there is no "jumping" of any kind when the heading is about
the same size as the width of the enclosing div (it can jump when displaying
the icon results in a new line break).
#table-of-contents {
position: fixed;
top: 0;
float: left;
margin-left: -300px;
width: 280px;
font-size: 90%;
border-right-style: solid;
border-right-width: 1px;
border-right-color: var(--border-light);
}
#text-table-of-contents {
overflow-y: scroll;
height: 100vh;
padding-right: 20px;
padding-left: 5px;
}
#text-table-of-contents li {
font-family: var(--font-sans);
}
#text-table-of-contents ul li {
font-weight: bold;
}
#text-table-of-contents ul li ul li {
font-weight: normal;
}
#text-table-of-contents ul {
margin: 0;
padding: 0;
}
#text-table-of-contents > ul > li {
padding-top: 1em;
}
#text-table-of-contents > ul > li:last-child {
padding-bottom: 1.5em;
}
4.6.2.3. Track the current headline
We use some JavaScript to track the current headline when we scroll up or
down the page, forcing it to stay in sync with the content in the main body. We
initially used Bootstrap's scrollspy component, but dropped it because it was
too heavy and opinionated.
functionscrollIntoViewIfNeeded(target) {
if ((target.getBoundingClientRect().bottom > window.innerHeight)
|| (target.getBoundingClientRect().top < 0)) {
target.scrollIntoView({ behavior: "smooth",
block: "center",
inline: "center" });
}
}
functiondeactivate_other_toc_items(hash) {
$("#text-table-of-contents a").each((index, elt) => {
if (elt.hash !== hash) {
$(elt).removeClass("active");
}
})
}
functionget_toc_item(hash) {
return $(`#text-table-of-contents a[href='${hash}']`)[0];
}
$(document).ready(() => {
$("#text-table-of-contents a").click((e) => {
vartocItem = get_toc_item(e.target.hash);
$(tocItem).addClass("active");
deactivate_other_toc_items(e.target.hash);
});
$("*[id^='outline-container-h-']").each((index, elt) => {
varhash = elt.getAttribute("id")
hash = hash.replace("outline-container-", "#")
vartocItem = get_toc_item(hash);
elt.addEventListener("mouseover", () => {
$(tocItem).addClass("active");
deactivate_other_toc_items(hash);
});
elt.addEventListener("mouseover", (e) => {
// If we don't call stopPropagation(), we end up scrolling *all*// elements in the stack, which means we will try to scroll e.g.,// Section 5 and Section 5.1.2.3 (when we only want the latter).
e.stopPropagation();
// Unfortunately, scrollIntoViewIfNeeded is not supported on// Firefox. Otherwise we could do//// elems[0].scrollIntoViewIfNeeded({ block: "center" });//// instead. So here we call the custom function that does what we// want. See https://stackoverflow.com/a/37829643/437583.
scrollIntoViewIfNeeded(tocItem);
});
});
});
We add some styling for the "active" headline. The main point is to add a green
background and border around it. For the border, we have to make the non-active
headlines have a white (invisible) border around it because otherwise the active
border makes the item jump a little bit when it's applied.
We have to colorize the foreground color of the link because otherwise it
becomes the color of all other links as per 4.10.8.
4.6.3. Highlight and scroll to just-clicked-on item
When we click on any link (typically a code block but it can also be a headline
or some other intra-document link destination), the browser shifts the page
there. But sometimes we are already near the link destination so the page
doesn't move. Other times we get moved all the way to the top or the bottom of
the page, so by the time the browser finishes moving there, the user can be
confused as to know which destination the browser wanted to go to. This can be
somewhat disorienting.
The solution is to highlight the just-clicked-on link's destination element.
Every time we click on anything, we add a class to the destination element. Then
from CSS we can make this visually compelling. This way we give the user a
visual cue when clicking on links that navigate to a destination within the same
document.
We adapt the code in 4.6.2.3 to do what we need here. The
main difference is that while the code there is concerned with adding and
removing the active class from the #text-table-of-contents div, here we want
to do the same but for the outline-2 div which contains the outline text
(There can be multiple outline-2 divs, but this is immaterial.)
We also use the scrollIntoViewIfNeeded function to scroll the item into view,
but only if we need to. This way we minimize the need to scroll, resulting in a
much less "jumpy" experience.
functiondeactivate_other_non_toc_items(hash) {
$(".outline-2 *").each((index, elt) => {
if (`#${elt.id}` !== hash) {
$(elt).removeClass("active");
}
})
}
functionis_external_link(destination) {
return (destination[0] !== "#");
}
$(document).ready(() => {
$("a").click((e) => {
vardestination = null;
if (e.target.attributes.length > 0) {
destination = e.target.attributes.href.nodeValue;
} else {
destination = e.target.parentElement.hash;
}
// Only disable the browser's "jump to the link immediately" behavior if// we are dealing with an intra-document link. For links to other pages,// we want the default behavior. The destination is empty if the link// goes to another page.if (is_external_link(destination)) {
return;
} else {
e.preventDefault();
}
$(destination).addClass("active");
deactivate_other_non_toc_items(destination);
scrollIntoViewIfNeeded($(destination)[0]);
// Save intra-document link into history, but only if it's not a repeat// of one already there.varhash = destination;
if (history.state === null || history.state.hash != hash) {
history.pushState(
{hash: destination},
"", destination);
}
});
});
Normally, browsers only treat links across URLs as new points in history; this
means that for links within the page, their history is not saved. We make sure
to save it explicitly with history.pushState() though in HISTORY_PUSHSTATE.
So then every time we want to go back in history (by pressing the "back" arrow
button in the browser), we just need to scroll to it. We already scroll to the
element we click on when we push on a new history item into the stack, so
there's no need to keep it symmetric here.
We have to activate the restored history item (to re-highlight it with the
active class), so we do that also.
Lastly, setting the scrollRestoration property is critical because otherwise
the browser will want to restore the custom scroll position (instead of going to
the history item location we've saved).
4.7. Autogenerate CSS for syntax highlighting of source code blocks
Generate syntax-highlighting.css and quit emacs. This function is designed to
be run from the command line on a fresh emacs instance (dedicated OS process).
Unfortunately, it can only be run in interactive mode (without the --batch
flag to emacs).
If we use the workaround from here, we can generate a CSS file with colors from
batch mode. However, the hackiness is not worth it.
4.8. Misc settings
4.8.1. Use HTML5 export, not XML (to un-break MathJax)
By default on Org 9.6, MathJax settings (JavaScript snippet) gets wrapped in a
CDATA tag, and we run into the same problem described on this email that has
gone unanswered:
https://www.mail-archive.com/emacs-orgmode@gnu.org/msg140821.html. It appears
that this is because the document is exported as XML, not HTML. Setting the
document type to html5, as below, appears to make the CDATA tag magically
disappear.
For some reason we cannot specify citeproc styles based on a relative path in
our Org file. The solution is to set the org-cite-csl-styles-dir variable. See
this post.
4.8.3.1. Define CodeHighlightOn and CodeHighlightOff
If we don't do this, we get an error because the "coderef" links (the links
inside code blocks, for example ;ref:NSCB_NAME) will still try to run the
CodeHighlightOn and CodeHighlightOff JavaScript functions. Turning this
setting on here injects the definitions of these functions into the HTML.
Also make the background color of the programming language hover text the same
as what we have elsewhere. This hover text comes with Org mode's HTML export of
source code blocks.
Sidenotes are small blurbs of text that are displayed "out-of-band", on the
right margin. This right margin is good for presenting smaller ideas that
shouldn't necessarily sit in the main body text.
The CSS below is drawn primarily from here, with some modifications.
Allow HTML exports of Org files (including this one) to pull in CSS and
JavaScript that we've defined for Lilac by referring to a single theme file. The
inspiration for this setup comes from https://gitlab.com/OlMon/org-themes.
For the default fonts, we break up the definition over multiple lines here using
Emacs Lisp for readability.
Now we get the result of evaluating the above with __NREF__fonts-to-load()
(note the trailing parentheses (), which evaluates the referenced code block
before injecting its evaluated value).
Also note that we pull in both the lilac.css file which we tangle in 4.10, but this can be expanded by customizing the
value of lilac-html-head, per 4.5.5. For example, you
could make this variable link to a separate lilac-override.css file to
override any of the values we have hardcoded in lilac.css.
Typically we only need to look at the rendered HTML output in a web browser as
the raw HTML diff output is extremely difficult to parse as a human. So by
default we ask Git to exclude it from git diff by treating them as binary
data.
In order to still show the HTML textual diff, we can run git diff --text.
4.12.1.git add -p
Note that the above setting to treat HTML files as binary data prevents them
from being considered for git add -p. In order to add them, use git add -u
instead.
We use ERT, the Emacs Lisp Regression Testing tool for our unit tests. Pure
functions that take all of their inputs explicitly ("dependency-injected") are
easy to test because we just provide the various inputs and expect the function
to produce certain outputs. For functions that operate on an Emacs buffer, we
use with-temp-buffer to create a temporary buffer first before invoking the
functions under test.
Some functions we test expect Org mode to be active (so that certain Org mode
functions are available), so we turn it on here by calling (org-mode).
The Emacs manual for ERT defines fixtures as environments which provide setup
and tear-down.
When testing HTML output (behavior of (lilac-publish)), it's useful to create
a temporary Org file and to generate the HTML output (as part of "setup"). Then
we'd run the tests, and finally delete the temporary files (as part of
"tear-down").
We use (lilac-publish-fixture) to do the aforementioned setup and tear-down
for us. In between setup and tear-down, we execute the test function with a
funcall.
A child block can itself be a parent (and link to the nested child within it).
Expect to find a link to the nested child block from within the first child
block.
A file or collection of
files that include both source code and prose to explain it. Well-known
formats include Noweb files (*.nw) and Org mode files (*.org).
monoblock
an Org mode source code block with a
#+name: ... field. This block is an independent block and there are no other
blocks with the same name.
aka "Noweb-style reference". A
Noweb-style reference is just a name (string) that refers to a monoblock or
polyblock. See the Org manual.
Org mode
An Emacs major mode for *.org files,
where "major mode" means that it provides things like syntax highlighting and
keyboard shortcuts for *.org text files if you are using Emacs. For Lilac,
the important thing is that we use Org mode as a literate programming tool.
See Org mode.
polyblock
an Org mode source code block without a
#+name: ... field, but which has a #+header: :noweb-ref ... field. Other
blocks with the same Noweb-ref name are concatenated together when they are
tangled. Polyblocks are used in cases where we would like to break up a
single block into smaller pieces for explanatory purposes. In all other cases,
monoblocks are preferable, unless the source code block is not to be tangled
and is only for explanatory purposes in the woven output.
source code block
An Org mode facility
that allows you to enclose a multiline text (typically source code) with
#+begin_src ... and #+end_src lines. They are enclosed in a separate
background color in the HTML output, and are often used for illustrating
source code listings. The format is #+begin_src LANGUAGE_NAME where
LANGUAGE_NAME is the name of the programming language used for the listing.
If the name is a recognized name, it will get syntax highlighting in the
output automatically.
The act of converting a raw literate
document to a richer format such as PDF or HTML. This allows fancier output,
such as for mathematical formulas, which are easier to read versus the
original literate document.
7. References
[1]
N. Ramsey, “Literate programming simplified,” IEEE Software, vol. 11, no. 5, pp. 97–105, Sep. 1994, doi: 10.1109/52.311070.