Semantic line breaks

Dec 29, 2019

Web Development · Writing

When writing text (prose, not code) you need to decide where to break lines so they remain within manageable limits. More specifically, I’m referring to text that’s marked-up for rendering in a different format, as opposed to plain text.

The technique of using semantic line breaks is old, going back to Brian Kernighan’s first edition of UNIX for Beginners published in 1974.⁠^[1] The idea itself seems to be even older. It’s attributed to a discovery by Buckminister Fuller in the 1930s, who called it “ventilated prose”.⁠^[2]

Semantic line breaks are also sometimes referred to as semantic linefeeds. But before getting into what semantic line breaks are, let’s examine the most prevalent technique used for wrapping text.

Fixed column limit

This technique involves using a fixed column limit (usually 80 characters) by inserting line breaks between words such that all lines stay within this character limit.

Issues with a fixed column limit

It’s hard to write and edit because sentences are split arbitrarily based on where they start and how long they are. This also makes it harder to discern overly lengthy sentences.
Changing a single word in a paragraph may require re-wrapping all subsequent text in that paragraph. As a consequence, version control diffs become harder to review because they incorrectly consider it as a change to the entire paragraph.⁠^[3]
Searching for a phrase (without resorting to regex) is unreliable because potential line breaks within the phrase need to be explicitly ignored.

An example of text wrapped at a fixed limit of 80 characters

Every tweet so far has been bad, but scientists are hopeful that one day we'll
discover a way to make them good. She borrowed the book from him many years ago
and hasn't yet returned it. When I was little, I had a car door slammed shut on
my hand. I still remember it quite vividly. He ran out of money, so he had to
stop playing poker.

Soft wrap

Soft wrap, also called word wrap, involves letting the text editor automatically wrap lines based on some margin — usually the window or screen size.

It’s primarily used while writing WYSIWYG documents using a word processor. The word processor would soft wrap on page margins, and you would insert line breaks to end paragraphs rather than sentences.

This technique is seldom used for plain text because there’s no meaningful relation between the size of the text and a physical page. Therefore, lines typically flow till the end of the text editor’s window or screen size. Soft wrap doesn’t alleviate any of the issues with a fixed column limit technique, nor does it bring any real benefits. Avoid it at all cost for plain text unless you have a really pressing reason not to.

Semantic line breaks

Semantic line breaks, also referred to as sentence per line, is inserting line breaks on semantic boundaries rather than on a fixed column limit. Semantic boundaries are places where you would naturally pause while reading a sentence. The intention is to emphasize meaning and preserve the grammatical structure of sentences.

Basic rules of semantic line breaks

Always break after a sentence boundary (period, exclamation mark or question mark).
Break after independent clauses (comma, semi-colon, colon or an em-dash) when the sentence is too long. Independent clauses make sense on their own and as part of the overall sentence.
Break after a dependent clause only if it clarifies grammatical structure or if you’re constrained for line length.

The example now wrapped using semantic line breaks

Every tweet so far has been bad,
but scientists are hopeful
that one day we'll discover a way to make them good.
She borrowed the book from him many years ago
and hasn't yet returned it.
When I was little,
I had a car door slammed shut on my hand.
I still remember it quite vividly.
He ran out of money,
so he had to stop playing poker.

I’ve wrapped lines at every semantic boundary for the purpose of demonstrating the technique. You don’t strictly need to wrap prior to every dependent clause (like before “that one day” in the above example) as it’s optional but valid nonetheless if you chose to do so. This is a clear improvement in readability and maintainability compared to the fixed column limit version. None of the issues with the fixed column limit version are present. All the articles on my blog are written using semantic line breaks. It’s proved immensely useful while re-arranging text and in identifying overly lengthy sentences with just a glance.

It’s possible to use semantic line breaks with any markup language which doesn’t interpret a line break as a new element. These languages include Asciidoc, Markdown, reStructuredText and several others.

Here’s a formal semantic line breaks specification, published by Mattt under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

Disadvantages of semantic line breaks

I wouldn’t call it a disadvantage, but the only inconvenience I’ve faced while using semantic line breaks is when lines contain elements like hyperlinks or footnotes. These often significantly increase the length of lines beyond my preferred limit.

Asciidoc offers a way to externalize links and footnotes using macros substitutions. While this does not solve the problem entirely, it helps a great deal. If your markup language doesn’t offer something similar, you may want to take this into consideration.

1. Brandon Rhodes - The origins of semantic linefeeds.

2. Asciidoctor - The origins of ventilated prose.

3. Some version control systems may handle this situation better, but not all of them.