Aaron D. Parks
June 10, 2022
A few years ago, I found out I'd been tying my shoes wrong for my entire life. I thought laces came undone easily and didn't usually look very good. At least that's how mine were, and I never paid much attention to anyone else's. It took a couple of weeks to re-train my hands but now I have bows in my laces that look good and rarely come undone.
Even with a lot of help from a good text editor, writing HTML can be a drag. Nice documents end up as tag-swamps with little bits of content perched atop hills of tabs. Editing them becomes a test of patience and we get sick at the thought of having to look at our once-loved text. It doesn't have to be like this! There's a lightweight, easygoing way to write HTML that's been around since the beginning of the web.
Way back when the web was just getting started, troff
was a
very popular UNIX typesetting package.
troff
took marked-up text as input and produced output suitable
for a typesetter or PostScript printer.
Figure 1 shows the beginning of an article marked up for troff
.
We see that troff
mark-up is quick and easy to write.
There's very little mark-up needed and it fits around the text rather than
expecting the text to fit around it.
It also plays nice with any text editor: there's no need for syntax
highlighting, help with balancing tags, or help with indentation.
In fact, troff
authors usually started each sentence on a new
line — a practice that made it easier to wrangle text on ancient
paper terminals with ed
(back before it got a visual mode
for glass terminals and became vi
).
With the web came the need (or desire) to write HTML.
The explosive growth of the web community brought lots of new faces to the
Internet and we mostly learned HTML from each other, not from UNIX graybeards
who knew troff
.
Figure 2 is the sort of example we learned from.
Its mark-up mirrors the browser's internal representation of the element tree.
This style has a lot of mark-up compared to Figure 1, especially if we count the whitespace introduced by the practice of using indentation to illustrate the element tree and keep track of matching tags. Our article takes longer to type and covers more screen, but it has increased only in bulk — not value. It's a fine didactic trick for introductory tutorials, but it had no right to become codified as “best” practice; we wouldn't model residential building codes on a child's birdhouse.
Figure 3 shows the same article again, marked up in a style that is more
concise and respectful of the text — like troff
.
Wouldn't this be a more pleasant way to write HTML? Our text would stand out better from the mark-up and we wouldn't wear our fingers out typing tags. Editing would be easier and we'd need less support from sophisticated text editors.
This is proper, conformant HTML we can send straight to the browser without processing, filtering, or compiling it. Let's take a closer look!
Most of us learned to write mark-up for each and every element we wanted in our document and we learned to balance our tags around each element's content. XML and XHTML doubled down on this vain rigidity, but thankfully finer sensibilities have since prevailed.
Not only was this effort not necessary, it actually harmed the readability of the marked-up text and made it harder to edit. And it didn't help the machines any, either! Web browsers can unambiguously parse and render conforming documents which omit certain tags. This isn't a cutting-edge feature: it goes back at least to the original HTML RFC from November of 1995.
The HTML Standard tells us that a document has four required elements and that both the start and end tags may be omitted for three of them.
An HTML document always has as its root an html
element.
We can leave out both tags without any confusion.
Likewise, every document has as the sole contents of its root element one
head
element and one body
element, in that order.
Any content which can only go in the body
(and not in the
head
) unambiguously indicates the division between the two
elements.
So, that's four more tags we can leave out.
This dispenses with almost all of the usual boilerplate.
All we really have to have is the document-type declaration and a
title
.
Figure 4 shows a document with just what's absolutely required.
Body text can't just wander around on its own; it needs to be put into
containing elements.
For prose and also for many other types of text, p
elements are
the handiest.
There's not an unambiguous way to infer the start of a p
element,
so we have to provide the start tag.
However, any content which cannot go in a p
element
(most other block-display elements, for example) implies the end of its
content, so we can usually leave off the end tag.
Starting each sentence in a paragraph on its own line (like back in
the troff
days) makes it easier to rearrange the sentences in a
paragraph — something I do more often than I would have expected.
If you hard-wrap your text, starting each sentence on its own line also
limits the area affected by edits within a sentence.
Leaving a blank line between paragraphs or other content that functions like a paragraph (a figure or small table, for example) aids navigation and helps us grab appropriate chunks for structural re-arrangement. More generally, using vertical white-space in preference to horizontal white-space lets the text run to a comfortable length even on small screens (or in small windows) and it obviates indentation joggling during structural re-arrangement. It also encourages the mark-up to serve the text — as it should — rather than making the text serve the markup. Figure 5 shows a couple of paragraphs in this style.
The img
element can be used in two ways: displayed inline (a
logo or emoji inline with text, for example) or displayed as a block (an
illustration or diagram between paragraphs of text, for example).
Because it can be used within a paragraph, its presence does
not imply the end of a paragraph.
To mark up an image to be shown as a block between paragraphs of text,
we could explicitly close the preceding paragraph with an end tag.
My preference, however, is to wrap such images either in their own
paragraphs or in figure
elements if they're to get captions
or special styling.
Figure 6 shows an image figure with a caption.
A list is the most clear and concise way to structure and present certain
types of information.
We can omit the end tags for li
, dt
, and
dd
elements as long as they're followed by something that can't
go in a list item (another list item, for example).
Figure 7 shows a tiny list that we can treat like a single paragraph, setting it apart from its neighbors with blank lines.
For lists with more complex items or groups of related items, we can treat each item or group of items as a paragraph and set them off from one another with blank lines. This additional vertical white-space will help us navigate the document and grab chunks of the list for easy re-arranging. Figure 8 shows a definition list where each term has multiple definition items and the terms are separated by blank lines.
For this list, I thought it better to forego my usual practice of starting each sentence on its own line. Don't be afraid to develop your own style or to adapt it to suit each document or section of a document.
HTML tables weren't standardized in the original HTML RFC.
Six months later, though, they had
their own
RFC.
It's good that we have them, because a table is often the most accessible and
versatile way to communicate data.
We can omit the end tags for tr
, th
, and
td
elements when they're followed by content that doesn't
belong in them (usually another row, header cell, or data cell).
Figure 9 shows a tiny table that we can treat as if it were its own paragraph. We'll separate it from its neighbors with blank lines and start each row on its own line.
We can make a complex table like the one in Figure 10 easier to navigate and edit by treating each row as a paragraph. We'll separate the rows with blank lines and start each column on its own line.
We can leave the quotes off of attribute values as long as the value doesn't contain a quote (single or double), equal-sign, gator, back-tick, or whitespace. Oh, and as long as the value is not the empty string (that would be ambiguous).
Between the attributes, we can put not just spaces but also newlines or tabs.
In other words, a start tag may be split across lines.
This is particularly handy for elements on which we'd like to set many
attributes, such as img
.
We can also put newlines and tabs inside quoted attribute values as long as
their values are allowed to contain those characters.
This is especially nice for attributes with long values, like alt
.
Figure 11 shows both of these techniques used to mark up an image depicting
a complex diagram.
It used to be the case that URL parsers would remove newlines and tabs, so we could split long URLs across lines and even format their query parameters nicely with tabs. Unfortunately, this was taken advantage of for data exfiltration via HTML injection and we no longer have this nice thing as URL parsers have been made more strict to prevent this kind of attack.
Omitting tags and tabs is one clever tactic serving the strategy of minimizing mark-up and making it serve the text. Of even greater benefit is selecting good elements to use and leaving out any that aren't pulling their weight.
If the text of a document is pure prose, all that is required of us
is to mark up its paragraphs with p
tags.
Anything we do beyond this is embellishment and we should make sure each
thing we do pays us back for the mental and physical effort it demands.
Using h1
and h2
tags to mark up a document's title
and section headings makes the document more accessible and easier to style,
makes the mark-up easier to follow, and makes it easier for machines to
discern the structure of the document.
Including the author's contact information on a page shows readers how to
ask questions, offer corrections, and generally socialize with the author
and have a good time.
This is important human stuff.
The address
element is semantically appropriate for this
information and it gives us a good way to get hold of its content for
styling.
If a document needs photos, drawings, or listings to illustrate its points,
we can use img
or pre
elements for the task.
If there are only one or two such elements, we can wrap an image in its own
paragraph or leave a pre
element bare.
If there are a lot of figures that are referred to by name or that
would benefit from being styled specially, we can use figure
and
figcaption
elements to wrap and describe them.
Don't be shy about using phrasing content elements like
code
, em
, sup
, or sub
as needed.
They supply important clues to help the reader make sense of the text.
The minimal mark-up described so far is usually enough to work with when it
comes time to style the document.
In cases where we can't write a clever-enough CSS selector, we can add a
class
attribute to one or more elements.
If it's just a one-off, I might even use the style
attribute.
Styling is not mere ornamentation — at least it shouldn't be.
If the element tree isn't deep enough to support styling the document
needs, it's time to break down and add an element just for styling.
Try to use appropriate semantic elements where possible, and fall back to
div
and span
only as a last resort.
I hope I've persuaded you to write good HTML from now on. Failing that, I hope I've at least expressed myself well enough to get you thinking about it. You can view the source of this document to see how these principles play out (or don't!) in practice.
Oh! I almost forgot. The right way to tie your shoes is with a square knot. It's easy to confuse this with the granny knot, which is the wrong way. The square knot is a simple and sound knot with many uses. The granny knot is an unsound knot whose only known uses are to make your shoelaces look crooked and to trip you. If you have a deeper interest in shoelaces and their knots, you may enjoy Ian's Shoelace Site.
If you'd be interested in seeing a deeper dive on any of the topics I touched on in this article, please let me know. In particular, I feel like there's a lot more to be said about styling, images, and scripting. But I'll let you be my guide!
If you have any questions, comments, or corrections please don't hesitate to drop me a line.
Aaron D. Parks