Write HTML Right

Aaron D. Parks smirks knowingly as his eyes meet the camera

Aaron D. Parks

June 10, 2022

Share: Twitter Reddit Facebook Linkedin Telegram

A few years ago, I found out I'd been tying my shoes wrong for my entire life. I thought laces came undone easily and didn't usually look very good. At least that's how mine were, and I never paid much attention to anyone else's. It took a couple of weeks to re-train my hands but now I have bows in my laces that look good and rarely come undone.

Even with a lot of help from a good text editor, writing HTML can be a drag. Nice documents end up as tag-swamps with little bits of content perched atop hills of tabs. Editing them becomes a test of patience and we get sick at the thought of having to look at our once-loved text. It doesn't have to be like this! There's a lightweight, easygoing way to write HTML that's been around since the beginning of the web.

Background

Way back when the web was just getting started, troff was a very popular UNIX typesetting package. troff took marked-up text as input and produced output suitable for a typesetter or PostScript printer. Figure 1 shows the beginning of an article marked up for troff.

Figure 1. A troff document.
.TL
Building a Streaming Music Service with Phoenix and Elixir
.PP
I thought it would be nice to make a streaming music service focused on
bringing lo-fi artists and listeners together.
Early on, I built a series of prototypes to explore with a small group of
listeners and artists.
Since this is a technical article, I'll jump right into the requirements we
arrived at, though I'd love to also write an article on the strategies
and principles that guided our exploration.
.SH
Requirements
.PP
We liked a loose retro-computing aesthetic with a looping background that
changed from time to time.
We preferred having every listener hear the same song and see the same
background at the same time.
And we liked the idea of sprinkling some "bumpers" or other DJ announcements
between the songs.

We see that troff mark-up is quick and easy to write. There's very little mark-up needed and it fits around the text rather than expecting the text to fit around it. It also plays nice with any text editor: there's no need for syntax highlighting, help with balancing tags, or help with indentation. In fact, troff authors usually started each sentence on a new line — a practice that made it easier to wrangle text on ancient paper terminals with ed (back before it got a visual mode for glass terminals and became vi).

With the web came the need (or desire) to write HTML. The explosive growth of the web community brought lots of new faces to the Internet and we mostly learned HTML from each other, not from UNIX graybeards who knew troff. Figure 2 is the sort of example we learned from. Its mark-up mirrors the browser's internal representation of the element tree.

Figure 2. An HTML document in the common style.
<!DOCTYPE html>
<html>
	<head>
		<title>
			Building a Streaming Music Service with Phoenix
			and Elixir
		</title>
	</head>
	<body>
		<h1>
			Building a Streaming Music Service with Phoenix
			and Elixir
		</h1>
		<p>
			I thought it would be nice to make a streaming music
			service focused on bringing lo-fi artists and
			listeners together.
			Early on, I built a series of prototypes to explore
			with a small group of listeners and artists.
			Since this is a technical article, I'll jump right
			into the requirements we arrived at, though I'd
			love to also write an article on the strategies and
			principles that guided our exploration.
		</p>
		<h2>Requirements</h2>
		<p>
			We liked a loose retro-computing aesthetic with a
			looping background that changed from time to time.
			We preferred having every listener hear the same
			song at the same time.
			And we liked the idea of sprinkling some "bumpers"
			or other DJ announcements between the songs.
		</p>
	</body>
</html>

This style has a lot of mark-up compared to Figure 1, especially if we count the whitespace introduced by the practice of using indentation to illustrate the element tree and keep track of matching tags. Our article takes longer to type and covers more screen, but it has increased only in bulk — not value. It's a fine didactic trick for introductory tutorials, but it had no right to become codified as “best” practice; we wouldn't model residential building codes on a child's birdhouse.

Figure 3 shows the same article again, marked up in a style that is more concise and respectful of the text — like troff.

Figure 3. An HTML document in the troff style.
<!DOCTYPE html>
<title>Building a Streaming Music Service with Phoenix and Elixir</title>

<h1>Building a Streaming Music Service with Phoenix and Elixir</h1>

<p>
I thought it would be nice to make a streaming music service focused on
bringing lo-fi artists and listeners together.
Early on, I built a series of prototypes to explore with a small group of
listeners and artists.
Since this is a technical article, I'll jump right into the requirements we
arrived at, though I'd love to also write an article on the strategies
and principles that guided our exploration.

<h2>Requirements</h2>

<p>
We liked a loose retro-computing aesthetic with a looping background that
changed from time to time.
We preferred having every listener hear the same song and see the same
background at the same time.
And we liked the idea of sprinkling some "bumpers" or other DJ announcements
between the songs.

Wouldn't this be a more pleasant way to write HTML? Our text would stand out better from the mark-up and we wouldn't wear our fingers out typing tags. Editing would be easier and we'd need less support from sophisticated text editors.

This is proper, conformant HTML we can send straight to the browser without processing, filtering, or compiling it. Let's take a closer look!

Tags and Elements

Most of us learned to write mark-up for each and every element we wanted in our document and we learned to balance our tags around each element's content. XML and XHTML doubled down on this vain rigidity, but thankfully finer sensibilities have since prevailed.

Not only was this effort not necessary, it actually harmed the readability of the marked-up text and made it harder to edit. And it didn't help the machines any, either! Web browsers can unambiguously parse and render conforming documents which omit certain tags. This isn't a cutting-edge feature: it goes back at least to the original HTML RFC from November of 1995.

Document Structure

The HTML Standard tells us that a document has four required elements and that both the start and end tags may be omitted for three of them.

An HTML document always has as its root an html element. We can leave out both tags without any confusion. Likewise, every document has as the sole contents of its root element one head element and one body element, in that order. Any content which can only go in the body (and not in the head) unambiguously indicates the division between the two elements. So, that's four more tags we can leave out.

This dispenses with almost all of the usual boilerplate. All we really have to have is the document-type declaration and a title. Figure 4 shows a document with just what's absolutely required.

Figure 4. A minimal HTML document.
<!DOCTYPE html>
<title>Minimal document</title>

Paragraphs

Body text can't just wander around on its own; it needs to be put into containing elements. For prose and also for many other types of text, p elements are the handiest. There's not an unambiguous way to infer the start of a p element, so we have to provide the start tag. However, any content which cannot go in a p element (most other block-display elements, for example) implies the end of its content, so we can usually leave off the end tag.

Starting each sentence in a paragraph on its own line (like back in the troff days) makes it easier to rearrange the sentences in a paragraph — something I do more often than I would have expected. If you hard-wrap your text, starting each sentence on its own line also limits the area affected by edits within a sentence.

Leaving a blank line between paragraphs or other content that functions like a paragraph (a figure or small table, for example) aids navigation and helps us grab appropriate chunks for structural re-arrangement. More generally, using vertical white-space in preference to horizontal white-space lets the text run to a comfortable length even on small screens (or in small windows) and it obviates indentation joggling during structural re-arrangement. It also encourages the mark-up to serve the text — as it should — rather than making the text serve the markup. Figure 5 shows a couple of paragraphs in this style.

Figure 5. Paragraphs.
<p>
This is my take on streaming music for work and study time.
I’ve always liked to listen to music while I’m at my desk.
As soon as I was introduced to lo-fi I knew it was just the thing for when
I’m writing code and I’ve been into it ever since.
I think by talking with listeners and artists I can figure out how to make
a better place for lo-fi folks to get together.

<p>
As a listener, “better” starts with practical stuff: get in and get listening
quick, mute when you need to.
But it’s not all business… retro styling and huge, moody backgrounds set the
stage.
I curate the playlist for lots of mellow vibes, sometimes with a touch of
melancholy but more often with a bounce of optimism.

The img element can be used in two ways: displayed inline (a logo or emoji inline with text, for example) or displayed as a block (an illustration or diagram between paragraphs of text, for example). Because it can be used within a paragraph, its presence does not imply the end of a paragraph.

To mark up an image to be shown as a block between paragraphs of text, we could explicitly close the preceding paragraph with an end tag. My preference, however, is to wrap such images either in their own paragraphs or in figure elements if they're to get captions or special styling. Figure 6 shows an image figure with a caption.

Figure 6. An image as a figure.
<figure>
<figcaption>Figure 1. An animated winking face.</figcaption>
<img src=wink.gif alt="Pixelation adds a retro feel to the emoji.">
</figure>

Lists

A list is the most clear and concise way to structure and present certain types of information. We can omit the end tags for li, dt, and dd elements as long as they're followed by something that can't go in a list item (another list item, for example).

Figure 7 shows a tiny list that we can treat like a single paragraph, setting it apart from its neighbors with blank lines.

Figure 7. A menu list.
<menu>
<li><a href=prev.html>Previous page</a>
<li><a href=toc.html>Table of contents</a>
<li><a href=next.html>Next page</a>
</menu>

For lists with more complex items or groups of related items, we can treat each item or group of items as a paragraph and set them off from one another with blank lines. This additional vertical white-space will help us navigate the document and grab chunks of the list for easy re-arranging. Figure 8 shows a definition list where each term has multiple definition items and the terms are separated by blank lines.

Figure 8. A definition list.
<dl>

<dt>HP 35665A Installation and Verification Guide
<dd>Part number 35665-90029
<dd>Describes how to install the instrument and perform operational and
performance verification tests. The tests are very extensive and require
additional instruments and tools.
<dd>I found a <a href=35665-90029.pdf>scan</a> of this document on the
Internet. I checked that all of the pages appear to be present and tidied
it up for printing.

<dt>HP 35665A Quick Start Guide
<dd>Part number 35665-90035
<dd>Introduces the controls of the instrument and walks through some
example tasks to get the user comfortable making measurements.
<dd>I found a <a href=35665-90035.pdf>scan</a> of this document on the
Internet. I checked that all of the pages appear to be present and tidied
it up for printing.

</dl>

For this list, I thought it better to forego my usual practice of starting each sentence on its own line. Don't be afraid to develop your own style or to adapt it to suit each document or section of a document.

Tables

HTML tables weren't standardized in the original HTML RFC. Six months later, though, they had their own RFC. It's good that we have them, because a table is often the most accessible and versatile way to communicate data. We can omit the end tags for tr, th, and td elements when they're followed by content that doesn't belong in them (usually another row, header cell, or data cell).

Figure 9 shows a tiny table that we can treat as if it were its own paragraph. We'll separate it from its neighbors with blank lines and start each row on its own line.

Figure 9. A simple table.
<table>
<tr><th>Treasure<th>Points for taking
<tr><td>Huge diamond<td>10
<tr><td>Bag of coins<td>10
</table>

We can make a complex table like the one in Figure 10 easier to navigate and edit by treating each row as a paragraph. We'll separate the rows with blank lines and start each column on its own line.

Figure 10. A complex table.
<table>

<tr>
<th>Tone control
<th>Lower corner
<th>Upper corner
<th>Peak response
<th>File

<tr>
<td>Maximum
<td>643Hz
<td>4,770Hz
<td>13dB
<td><a href=FRTONMAX.DAT>FRTONMAX.DAT</a>

<tr>
<td>Middle
<td>328Hz
<td>2,000Hz
<td>7.5dB
<td><a href=FRTONMID.DAT>FRTONMID.DAT</a>

<tr>
<td>Minimum
<td>205Hz
<td>1,449Hz
<td>3.5dB
<td><a href=FRTONMIN.DAT>FRTONMIN.DAT</a>

</table>

Attributes

We can leave the quotes off of attribute values as long as the value doesn't contain a quote (single or double), equal-sign, gator, back-tick, or whitespace. Oh, and as long as the value is not the empty string (that would be ambiguous).

Between the attributes, we can put not just spaces but also newlines or tabs. In other words, a start tag may be split across lines. This is particularly handy for elements on which we'd like to set many attributes, such as img. We can also put newlines and tabs inside quoted attribute values as long as their values are allowed to contain those characters. This is especially nice for attributes with long values, like alt. Figure 11 shows both of these techniques used to mark up an image depicting a complex diagram.

Figure 11. An img tag.
<img width=500 height=600 src=entity-relationships.png
alt="
An entity relationship diagram shows seven entity types, their properties,
and the relationships between them.
An audio play has a single property: the time at which it started.
It has one and only one audio item.
An audio item has a single property: its duration.
It has zero or one songs, zero or one bumpers, and one and only one media.
A song has three properties: its title, the name of the artist, and an URL to
be shown while it is playing.
It has one and only one audio item.
A bumper has one property: its name.
It has one and only one audio item.
A media has two properties: its type and file name.
It has zero or one audio items and zero or one backgrounds.
A background has three properties: its title, the name of the artist, and an
URL to be shown while it is playing.
It has one and only on media.
It has zero or more background plays.
A background play has two properties: the time at which it started and its
duration.
It has one and only one background.
">

It used to be the case that URL parsers would remove newlines and tabs, so we could split long URLs across lines and even format their query parameters nicely with tabs. Unfortunately, this was taken advantage of for data exfiltration via HTML injection and we no longer have this nice thing as URL parsers have been made more strict to prevent this kind of attack.

Bringing it Together

Omitting tags and tabs is one clever tactic serving the strategy of minimizing mark-up and making it serve the text. Of even greater benefit is selecting good elements to use and leaving out any that aren't pulling their weight.

If the text of a document is pure prose, all that is required of us is to mark up its paragraphs with p tags. Anything we do beyond this is embellishment and we should make sure each thing we do pays us back for the mental and physical effort it demands.

Using h1 and h2 tags to mark up a document's title and section headings makes the document more accessible and easier to style, makes the mark-up easier to follow, and makes it easier for machines to discern the structure of the document.

Including the author's contact information on a page shows readers how to ask questions, offer corrections, and generally socialize with the author and have a good time. This is important human stuff. The address element is semantically appropriate for this information and it gives us a good way to get hold of its content for styling.

If a document needs photos, drawings, or listings to illustrate its points, we can use img or pre elements for the task. If there are only one or two such elements, we can wrap an image in its own paragraph or leave a pre element bare. If there are a lot of figures that are referred to by name or that would benefit from being styled specially, we can use figure and figcaption elements to wrap and describe them.

Don't be shy about using phrasing content elements like code, em, sup, or sub as needed. They supply important clues to help the reader make sense of the text.

The minimal mark-up described so far is usually enough to work with when it comes time to style the document. In cases where we can't write a clever-enough CSS selector, we can add a class attribute to one or more elements. If it's just a one-off, I might even use the style attribute.

Styling is not mere ornamentation — at least it shouldn't be. If the element tree isn't deep enough to support styling the document needs, it's time to break down and add an element just for styling. Try to use appropriate semantic elements where possible, and fall back to div and span only as a last resort.

Conclusion

I hope I've persuaded you to write good HTML from now on. Failing that, I hope I've at least expressed myself well enough to get you thinking about it. You can view the source of this document to see how these principles play out (or don't!) in practice.

Oh! I almost forgot. The right way to tie your shoes is with a square knot. It's easy to confuse this with the granny knot, which is the wrong way. The square knot is a simple and sound knot with many uses. The granny knot is an unsound knot whose only known uses are to make your shoelaces look crooked and to trip you. If you have a deeper interest in shoelaces and their knots, you may enjoy Ian's Shoelace Site.

If you'd be interested in seeing a deeper dive on any of the topics I touched on in this article, please let me know. In particular, I feel like there's a lot more to be said about styling, images, and scripting. But I'll let you be my guide!

If you have any questions, comments, or corrections please don't hesitate to drop me a line.

Aaron D. Parks
Parks Digital LLC
support@parksdigital.com