Leon Bilton

How to not hate LaTeX

Last modified on May 17, 2022

In this article I will collect anecdotes and thoughts on \(\LaTeX\) and similar software. Although this tool is routinely used to create scientific and technical documents, the complexity of the task combined with certain idiosyncrasies of the software mean that it is notoriously uncomfortable to learn and can seem unwieldy. After several years of this experience, I have come to appreciate \(\LaTeX\) more, not least for its free and open source nature. I find the word like too strong in this context, but I don't hate \(\LaTeX\). Maybe these notes can also help others to arrive at this point. The article is structured more or less like an introduction to \(\LaTeX\), and doesn't assume much background.

Contents:

  1. In the beginning was the processor
  2. TeX, LaTeX, LuaLaTeX, ConTeXt, XeTeX, BibTeX…
  3. KOMA-Script
  4. The preamble
  5. Intermission: a rant about LaTeX syntax
  6. Single columns in two-column documents
  7. Bibliographies, citations and references
  8. Overfull and underfull, isn't it just wonderfull
  9. Fin


In the beginning was the processor

Advocates of \(\LaTeX\) will admit that it does not provide a "conventional" editing experience. Convention is in this case defined by a so-called market share of desktop operating systems, with one fairly obvious caveat being that Linux—the platform of choice for many \(\LaTeX\) users—is free. A second caveat is that \(\LaTeX\) can be used on other operating systems. Nevertheless, the idea of having to compile a document before visually affirming the formatting is alien to users of "word processors". It might even seem like a step backwards. That nice "Export to PDF" button (a recent addition in fact) is so much more in vogue. Unfortunately, just like compiling, it is also a one-way trip. Recipients of your PDF document have no reliable way to convert back, not without opening their wallet at least.

The approach here is simply different. A .tex file is a quite portable and efficient format for both the textual content, and the formatting directives of a document. Among other things, this allows for embedding almost arbitrarily complex mathematical expressions. Because .tex files are entirely encoded in plain text, they are tiny and can be transported with ease. Unfortunately a simple "track changes" tool for these files is sorely lacking, which makes collaboration with people who are unfamiliar with \(\LaTeX\) difficult. I do not think there is a good reason for this continued omission, nor do I think it is necessarily impossible to encode that kind of information inside the .tex file itself—see CriticMarkup for inspiration.

I digress. You still want to see the formatting. At this point, you would normally be directed to either something called "LyX", which attempts to display limited formatting of .tex documents in real time, or an online service like Overleaf which shows a periodically updated preview of the PDF. I recommend neither. While Overleaf is interesting for its collaboration features and integration with academic publishing, I don't think it is a necessary or even good way to introduce \(\LaTeX\). The point is that you don't need to care about how it looks, except for when you're either done writing or the compiler is shouting at you in gibberish. Getting over this is the first step to learning how to use this software.

Installing a TeX distribution is usually straightforward. I use TeX Live on Linux, but cannot make recommendations for other platforms. If you are new to open source software, please be prepared for a little troubleshooting. Something something… free lunch.

TeX, LaTeX, LuaLaTeX, ConTeXt, XeTeX, BibTeX…

Stop. Here's a simple rule. Are you writing a paper? If so, then you have to use \(\LaTeX\) (actually pdflatex), and there's probably a template somewhere, so in this case do check Overleaf or the publisher's website. Don't spend much time with formatting, they are going to change it anyway. The only other one of those you care about is BibTeX, it's a metadata format for citations. But you probably already know all that.

If not, then I suggest to use LuaLaTex. It handles fonts better and is generally not a dinosaur (you can actually type real Unicode characters like ä in the .tex file). It comes with TeX Live by default, so usually you can just run lualatex instead of latex. The Lua- part means that it supports executing Lua code in the .tex file. Not directly mind you, you need to use the \directlua command. I've never used it yet, the UTF-8 support is what sold it for me. No, it is not directly compatible with old \(\LaTeX\). But you only ever need the latter for papers, in which case the publisher's template will probably be the main restriction anyway.

ConTeXt is a different attempt at a change, but it seems abandoned. Some of it got incorporated in to LuaTex which powers LuaLaTex. XeTeX is… something? It also does UTF-8, but differently? I don't really know, but it's older than LuaTeX. LuaLaTeX also allows the use of some new or improved packages, like polyglossia for multilingual documents or unicode-math to interpret Unicode input in mathematics. No more \varepsilons taking up space and causing verbose, hard to read code for equations. Now you can input ε directly (using an appropriately configured editor), which also works in \(\KaTeX\) on the web, by the way.

KOMA-Script

A lot of examples start with \documentclass{article}. But the built-in document classes are a nuisance. For example, you have to load a separate package just to change margins. Luckily some German folks figured that it was a silly idea, and if people are going to create a package for it anyway, why not add at least basic geometry options to the document class. A good starting point is

\documentclass[twocolumn,12pt,DIV=14]{scrarticle}

Remove the twocolumn option to use a single column format, but bear in mind that two column documents are easier to read, and use less paper. The DIV parameter controls the text area. It works like you'd expect, but for recommended font size and DIV combinations, scroll to Table 2.1 in the KOMA-Script manual (~3M PDF download).

Want more space between the two columns?

\setlength{\columnsep}{1cm}

The KOMA-Script classes are also nice once you become more familiar with \(\LaTeX\), since they expose a lot more options than the default classes. So far, I haven't had any trouble with using them from LuaLaTex. They save a lot of hassle, and eliminate the need for many band aid packages. If you need to change something, first check in the KOMA-Script manual. The option might already be there.

The preamble

The preamble of a \(\LaTeX\) document is what goes before the actual document environment, i.e. before \begin{document}, and consists mostly of loading required components and setting global options. A lot of examples out there have long and arcane preambles that would cause even the bravest novice pause. The worst offenders usually include comments that, rather than explaining anything, warn against the hubris of attempting to modify any of the intricate incantations. It should be obvious that this is not a viable strategy, unless you are prepared to house a resident guru.

Instead, it would be better to not put things in the preamble, until you understand what they are doing and can decide that they are really necessary. This is a good way to think about software dependencies in general. I refer the interested reader to the essay by Russ Cox, "Our Software Dependency Problem".

The LuaLaTex manual itself recommends polyglossia for multilingual documents. This means that it provides things like the correct settings for dates and hyphenation patterns (telling LuaLaTex where it should break words to justify text). It also appropriately changes the result of commands like \section, \chapter, etc.

I have already mentioned unicode-math, but an essential package for mathematics is amsmath. This provides things like aligned or line-wrapped equation environments and proper typography of bold math symbols. There is also \text for putting upright text inside an equation.

As a third essential package, I recommend microtype. This provides well researched inter-word spacing adjustments and other fine-tunings, which I generally don't care about but luckily there is no need to set anything manually. Just load the package, and maybe use the final option to ensure that the typographical corrections are applied even in draft mode.

\usepackage[final]{microtype}

This shouldn't be very noticeable, but it can shave off a page or two in long documents.

You'll find that a few more things are needed depending on context, like graphicx for images and hyperref if you want clickable links. The csquotes package can be used for non-English quotation marks (for correct English quotation marks use the unicode characters, i.e. “foo” or ‘foo’). Citations and bibliographies are another story, unfortunately.

Here's an example of a simple but complete preamble (although without bibliography support):

\documentclass[twocolumn,12pt,DIV=15,abstract=true]{scrarticle}
\setlength{\columnsep}{1cm}
\usepackage{polyglossia}
\setdefaultlanguage[variant=british]{english}
\usepackage{libertinus}
\usepackage[final]{microtype}

\usepackage{unicode-math}  % Better math font config, and unicode glyph math input
\usepackage{amsmath}  % Bold symbols, \DeclareMathOperator and $\text{…}$
\usepackage{siunitx}  % Physical units, number ranges, scientific and complex number notation macros

\usepackage{graphicx}  % For \includegraphics
\usepackage{subcaption}  % For the subcaptionblock environment (replaces minipage for subfigures)
\addtokomafont{caption}{\small}  % Use small font for caption text.
\setkomafont{captionlabel}{\bfseries}  % Use bold font for caption label.
\setcapindent{0pt}  % Don't use a hanging indent for multiline captions.

\usepackage{xcolor}  % Without options just provides the `!` mechanism, e.g. black!60 for grayscale
\usepackage{xurl}  % Better URL line breaks.
\usepackage[colorlinks=true,allcolors=black!60]{hyperref}  % Clickable references and URLs.
\usepackage[noabbrev]{cleveref}  % References figures/tables/equations properly.

Intermission: a rant about LaTeX syntax

Ah, the \(\LaTeX\) syntax. It's just not good. The verbosity is overwhelming, and many commands are not very mnemonic. There are already some hints of this in the provided preamble example. Why is the column separation set with \setlength, but the caption font is changed with \addtokomafont? And then \setkomafont is used for the caption label setting! Unfortunately, this is an example of relatively nice \(\LaTeX\) syntax. Matrices are written like:

\begin{bmatrix}
    a & b \\
    c & d
\end{bmatrix}

That's just horrible. Although using unicode to enter symbols alleviates some of the pain, the syntax is a wart that will remain and continue to discourage new users.

It's not just the verbosity or number of commands either. The mathematical "grammar" is all over the place. Some commands are named for their appearance, like \bigotimes or \circ (\deg also exists but just prints the upright letters 'deg'), but others are named for their mathematical function, like \subset or \to. To type the paragraph symbol § we use \S, but for ⋁ we use \bigvee spelled out (there is no \V command that I know of). Combinations of characters like <= are not interpreted, leading to a wonderful plethora of commands like \ll, \leq (or just \le if you're in a rush), \neq, \simeq,… Because the backslash is commonly used for escaping non-printed characters, we also have to do \mid instead of, say, \|.

My personal favourite is \ni. Can you guess what it does?

Finally, and perhaps most importantly, the syntax does not preserve mathematical semantics. In other words, all of the effort spent learning this "language" will be useful for little more than pretty-printing mathematics. It would be cool if equations could be pasted into a computer algebra system after copying them directly out of a \(\LaTeX\) document, but that won't happen. At least, not without carrying an extremely complex parser around and specifying a bunch of additional disambiguations. For a better attempt at mathematical syntax, see Mathematica. Here is a transcript of a 2010 talk by the man himself. There are also a few interesting open source projects in this area, like Cortex and Metatheory which I may look into at some point.

Single columns in two-column documents

I have already recommended the use of two-column documents. This will quickly lead to the painful realisation that equations are not line-wrapped automatically. This is a case where \(\LaTeX\) is simply too dumb. That is one of the reasons for loading the amsmath package, which provides the multiline, gather, split and align environments. They are all useful in different contexts, and starred versions usually suppress the equation number as usual. The folks at Overleaf provide a quick overview.

Luckily, it is possible to get full-width figures and tables, by simply using the figure* and table* environments. However, arbitrary blocks that span the full width of the page require an obscene syntax:

\documentclass{twocolumn,12pt,DIV=14,abstract=true]{scrarticle}
\usepackage[final]{microtype}
\usepackage{amsmath}

\title{Single column abstract in a two-column document}

\begin{document}

\twocolumn[%
\maketitle
\begin{@twocolumnfalse}
\noindent\hspace{0.1\linewidth}\begin{minipage}{0.8\textwidth}

    \begin{abstract}
        Hey look! We can finally type the abstract in here.
        It will appear centered and spread across 80\% of the page.
    \end{abstract}

\end{minipage}
\bigskip
\bigskip
\end{@twocolumnfalse}]

\end{document}

So to initialise the single-column box we use \twocolumn. A pretty good start, right? Oh, and here you need to use square brackets, because we are putting stuff into the "optional" argument of that command. I don't care to find out why.

The \maketitle command must be put inside the block, otherwise a page break appears after the title. Then, we use the actual @twocolumnfalse environment. The leading @ is required, it's just how things are.

You might have imagined that for centering the abstract, a command like \centering would be quite useful. This command does in fact exist, but when it is used inside \twocolumn it doesn't work as you'd expect. Instead, we shift the box manually by 0.1\linewidth (10% of the page width), taking care to enforce that the box itself is not indented like a paragraph, and then set up a minipage spanning 80% (note the use of \textwidth) of the page.

The \bigskips at the end make sure there is appropriate spacing between the box and the following content, which is somehow not the default.

Overall, it only took a triple-nested environment and five different commands. Success?

There is one more caveat to this approach. Because of how \(\LaTeX\) handles “floating” boxes like this, any so-called “fragile” commands, like \cite for example, will need to be enclosed in a \protect command to avoid mysterious and entirely unhelpful errors like

Argument of \blx@citeargs@i has an extra }.

when trying to do \parencite[e.g.][]{key} inside the box. Instead use \protect{\parencite[e.g.][]{key}}. The same applies for citations inside figure captions.

Bibliographies, citations and references

References to figures or tables are generally fairly easy, and the cleveref package simply remove the need to type 'table' or 'figure', instead we can just do

This is a reference to~\cref{<label>}.

where <label> is the label given to the figure or table via the \label command. The tilde ~ in front is used to tell the software to never break the line at that spot. To capitalise the word used by cleveref, e.g. at the start of a sentence, use \Cref instead.

Citations and bibliographic entries are more involved to set up, but they are nicely automated. The most annoying part is acquiring the reference metadata, which must be in BibTeX format. The BibTeX format has undergone various revisions and extensions, which can also complicate things. These days, journals will often provide a BibTeX snippet for most papers, if you can find it. It is also possible to search for the paper directly on CrossRef, and they usually provide higher quality BibTeX (the snippets from journals can be incomplete or garbled). That is to be expected, because these are the people who register Digital Object Identifiers (DOIs) for most academic papers out there and some other content as well. I have written a small python script to fetch BibTeX snippets automatically, given the DOI of a paper. It doesn't always work, and not all papers even have a DOI. Be prepared to manually fill in or correct missing BibTeX occasionally.

The BibTeX snippets are collected in a file ending with .bib. There are various ways to load this file into \(\LaTeX\). Whenever possible, I use the newer biber engine, and add something like this to the preamble:

\usepackage[backend=biber,bibencoding=utf8,style=apa]{biblatex}
\renewcommand*{\bibfont}{\normalfont\small}
\addbibresource{\jobname.bib}

The command \jobname resolves to the base name of the .tex file, so I just name my .bib file identically. Citations are included in the text by using the \cite or \parencite commands.

Now, because \(\LaTeX\) is pretty stupid, it doesn't handle citations properly on its own. Not only do we need to load the biblatex package (or similar, e.g. the legacy natbib), but we also need to run a different command on the output of the first lualatex run. For me, this is biber, as indicated by the backend option in the snippet.

In fact, it doesn't stop there. The biber tool only inserts citation text in some kind of naïve way or whatever, so we need to run lualatex again afterwards. No, no we're not done. To be completely sure that the formatting is readjusted after inserting citations, bibliography entries and possibly back-references, it is actually necessary to run lualatex twice afterwards.

Wow, that's pretty stupid. It's so stupid that a variety of build tools have sprung up to properly automate the document generation. That's right build tools, you are really just a programmer now. There are really two options: find an editor that has a \(\LaTeX\) plugin and can properly automate this absurd workflow, or embrace your inevitable fate…

src := $(wildcard *.tex)
bib := $(wildcard *.bib)
all: $(src:%.tex=%.pdf)

%.pdf: $(src) out/%.bbl
	@ 1>/dev/null lualatex --output-directory=out --interaction=batchmode --halt-on-error --file-line-error $< \
	|| echo "LuaLaTeX run 2/3 failed, check log files in out/ for details"
	@ lualatex --output-directory=out --halt-on-error --file-line-error $<|grep -v '/usr/share/'
	@ mv out/$@ $@

out/%.bbl: out/%.bcf
	@ biber --input-directory=out --output-directory=out $(<F)

out/%.bcf: $(src) $(bib) | out
	@ 1>/dev/null lualatex --output-directory=out --interaction=batchmode --halt-on-error --file-line-error $< \
	|| echo "LuaLaTeX run 1/3 failed, check log files in out/ for details"

out:
	@ mkdir -p out

This is a Makefile example for LuaLaTeX + Biber. You thought that \(\LaTeX\) syntax was bad, eh? I use it by storing the .tex and .bib files in the same folder, and just running make to compile the document. It's a Linux thing, I won't go over it here. Check GNU Make if you're interested. If you use a Linux platform, it should normally be installed already.

Overfull and underfull, isn't it just wonderfull

With any \(\LaTeX\) setup, you will eventually encounter the infamous warnings about 'overfull' or 'underfull' boxes. They are quite terse and mysterious:

Underfull \hbox (badness 1320) in paragraph at lines 691--691

Hmm, my .tex file doesn't even have 691 lines. That's right, don't waste time looking at the line number, it refers to some intermediate representation of the content. Instead, check the lines immediatelly following this in the command output (which is far to messy, another gripe). Usually these warnings are caused by

Avoiding these warnings is not critical and sometimes not practical. If there are some clues after the error, glance over the PDF and check for weird spacing issues. To add hyphenation hints to a long word, use \-, e.g. whats\-amajig. If word spacing is too large because of an inline equation, it is sometimes possible to fix the warning by adding e.g. \hspace{4pt} before and after the equation. For the two-column issue, I don't know of any way to prevent the warnings.

Fin

That's all for now. Go away and compile your document. Yes I know, the compiler is too slow. Parallel \(\LaTeX\) compilation, you ask? Hah, check back in a few decades, if you're extremely lucky we'll have multi-threading…