on software & life

Typographically correct footnotes in Asciidoctor

Footnotes are an elegant way to provide supplementary information without disrupting the flow and presentation of content. I find them so useful that I often rein myself in from using them excessively to prevent my article from resembling a Wiki page.

Footnotes in Asciidoctor are straight forward to use. The Asciidoctor User Manual explains, as it should, the most basic way of adding footnotes. However, the basic way is not sufficient if you care about typography. It doesn’t take into consideration that the footnote marker may be split from its text and be left dangling alone in the next line.

The problem

By default, Asciidoctor encloses footnote markers within square brackets. The problem occurs when browsers split the text from its the footnote marker, such that the footnote marker appears on the next line rather than staying paired with its text.

I encountered this problem on Chrome for Android (version 79.0.3945.136), but the problem didn’t manifest itself on any desktop browser I tested. I’ve highlighted the problem in the following screenshot.

Footnote marker wraps to the next line in the browser
Footnote marker dangles on the next line — Incorrect!
Do not add a space between the footnote marker and its text. This not only looks wrong when rendered on the same line, it’s an open invitation for browsers to split the marker into a new line.

The problem is this split happens even without a space between the footnote marker and its text. As proof, here’s the abridged Asciidoc source for the screenshot shown above:

… to a third party.footnote:[Reference text]

The square bracket in the snippet above is part of Asciidoc’s footnote syntax. It’s not be confused with the square brackets rendered in the HTML document.

As far as the browser is concerned, the regular space character is a clear indication where a break between words is allowed. But with square brackets (and other symbols), it’s ambiguous how browsers handle them when wrapping text. It’s plausible that on mobile devices, where screen space is a constraint, browsers choose to fit more words in a single line to improve readability.

Regardless, this is a problem, but thankfully one that can be solved easily.

The solution

The solution is to instruct the browser to never split the period and the opening square bracket used in our footnote marker.

Asciidoctor has a predefined attribute {wj} which emits the Word Joiner Unicode code point. By modifying our Asciidoc source like this:

… to a third party.{wj}footnote:[Reference text]

And rendering the Asciidoc as HTML, the footnote marker is no longer split from its text. Instead, the word — along with its footnote marker — has wrapped to the next line. It’s now typographically correct.

Footnote marker appears on the same line as its text
Footnote marker stays paired with its text — Correct!

The Asciidoc format allows replacements with arbitrary entity references. If you don’t want to rely on Asciidoctor for processing your Asciidoc documents, use ⁠ (U+2060 in Unicode hexadecimal notation) instead of {wj}.

The explanation

We want to prevent line breaks between our footnote marker and its text. Unicode offers a solution for precisely this problem.

Using Unicode is the best way to address this problem because it’s a well-established, long-standing and widely supported standard. Even if you render your Asciidoc document in other formats (not just HTML), it would produce the right result on conforming Unicode implementations.

There are several Unicode code points that seem to satisfy our requirement, but it isn’t obvious which one is correct, or if there are multiple correct choices.

Here’s a potential list of code points that one might tempted to use in this scenario:

  • No-break Space (NBSP U+00A0)

  • Narrow No-break Space (NNBSP U+202F)

  • Zero-width No-break Space (ZWNBSP U+FEFF)

  • Zero-width Joiner (ZWJ U+200D)

  • Combining Grapheme Joiner (CGJ U+034F)

  • Word Joiner (WJ U+2060)

Let’s examine the suitability of each of these code points using the Unicode Standard, its Line Breaking Algorithm Specification and Text Segmentation Guidelines.

No-break Space (NBSP)

NBSP is the preferred character to use where two words are to be visually separated but kept on the same line. However, NBSP behaves as a numeric separator for languages that read right-to-left, such as Arabic or Hebrew. In practice, it’s probably fine to use NBSP to prevent line breaks when you’re not using such languages. However, I don’t recommend using it from a typographic perspective. There shouldn’t be a visible word-space before footnote markers.

Narrow No-break Space (NNBSP)

NNBSP is the narrow version of NBSP. In terms of line breaking behavior, they are identical. However, NNBSP has additional semantics in French typography and in Mongolian text. For this reason and for the reasons mentioned for NBSP, I would not recommend using NNBSP for our problem.

Zero-width No-break Space (ZWNBSP)

ZWNBSP is as an invisible character used to keep its left and right neighbor characters on the same line. ZWNBSP, unlike NBSP and NNBSP, doesn’t have any additional semantics, which seems like the ideal solution to our problem. Well…​ yes and no. As the Unicode standard evolved, ZWNBSP has been deprecated and replaced by the unrelated Byte-Order Mark (BOM U+FEFF) which is used to mark the beginning of a Unicode stream. Implementations are expected to continue supporting ZWNBSP so it’s not wrong to use it, but it’s not the preferred choice when a better alternative exists.

Zero-width Joiner (ZWJ)

ZWJ is intended to produce a more connected rendering of adjacent characters. But don’t be mislead by the term “joiner” here. ZWJ is used in unusual cases where ligatures or cursive connections are required. The Unicode standard explicitly states that ZWJ has no effect on word or line break boundaries. Thus for our problem, ZWJ is not only semantically wrong, it won’t work in practice as well.

Combining Grapheme Joiner (CGJ)

In linguistics, a grapheme is the smallest unit of the writing system of any language. In Unicode, a grapheme may be represented using multiple code points. The Unicode standard, to avoid ambiguity with its use of “characters” uses the term user-perceived characters, which are approximated by a grapheme cluster. For instance, the Tamil character நி (Ni) comprises of two Unicode code points (“Na” U+0BA8 and “e” U+0BBF). Although CGJ falls under the Non-breaking “Glue” class of characters, it’s semantically incorrect to use it for merely preventing a line break.

Word Joiner (WJ)

WJ, like ZWNBSP, glues together left and right neighbor characters such that they are kept on the same line. However, unlike ZWNBSP, WJ is not deprecated and is stated as the preferred choice if the intent is to merely prevent a line break. It has no additional semantics in any language supported by Unicode. Pedantically speaking, it’s exactly what solves our problem without creating more edge cases.

In conclusion — Use Word Joiner.

Other edge cases

We’ve established the Word Joiner (WJ) as the correct solution and prevented a line break before the opening square bracket. But can line breaks occur before the closing square bracket? What about line breaks when there are multiple digits in the footnote marker? If we dig deeper at the Unicode standard, we should be able to figure out if this poses another problem.

  1. Can the footnote [10] be broken across lines as [1 and 0]? Rule LB22 of the Unicode Line Breaking Algorithm and rule WB8 of the Unicode Text Segmentation specification state that a line break between sequence of digits is not allowed.

  2. Can the footnote [10] be broken across lines as [10 and ]? The closing square bracket character falls under the Closing Parenthesis (CP) class and rule LB13 states that a line break before CP class of characters is not allowed.

  3. Can the footnote [1] be broken across lines as [ and 1]? Rule LB14 states that a line break after [ is not allowed.

However, here’s the catch. The LB rules mentioned above are tailorable line breaking rules. This means they are reasonable defaults and implementations should follow them unless alternate approaches produce better results.

And here’s the bad news. To guarantee the entire footnote marker in kept on the same line we must use {wj} after [ and before ] as well. As far as I know, unfortunately, Asciidoctor does not provide such level of control. In fact, I haven’t even found a way to avoid using square brackets altogether. I’ll update this article if I figure out how to further customize footnotes in Asciidoctor.

Practically speaking, the above edge cases shouldn’t be a problem, and using the Word Joiner as I’ve shown in the solution does solve a real problem. If we encounter any more cases of incorrect wrapping from browsers with our footnote markers, we can address them in the future. For now, I’m happy with this solution and I haven’t encountered wrapping issues in any of my footnote markers since.

Happy footnoting!