Some thoughts on lowering the learning curve for using TeX

Posted on September 4, 2011

TeX has a steep learning curve. Often times, steeper than it needs to be. Take, for example, the special characters in TeX. Almost every introduction to plain TeX, eplain, LaTeX, or ConTeXt has a section on these special characters

\ { } $ & # ^ _ & ~

A good introduction then goes on to explain why these special characters are important; sometimes dropping a hint about category codes. I feel that these details are useless and, at the user level, we should get rid of them.

If you are skeptical, I don’t blame you. After all, category codes are the very soul of TeX. However, I strongly believe that they are useless at the user level. Lets go over each of these special characters one-by-one and see if we really need them.

Minimum category codes: \ { }

The only category codes that we need at the user level are \ { }. The character \ marks the start of a control sequence, and { and } group the arguments. The rest, can simply be replaced by control sequences.

Math mode category codes: $ _ ^

In TeX, $ is used to delimit math mode—Knuth used dollars as the math shift character because typesetting math was expensive, so goes an old joke. But do we really need to stick to $? After all, at the user level, both LaTeX and ConTeXt do not use $$ to move to display math mode. Both macro packages provide environments for display math. Can’t we do the same for in-line math? In fact, both LaTeX and ConTeXt also provide macros for in-line math: LaTeX uses \(...\) and ConTeXt uses \m{...} and \math{...}. The only trouble is that these macros are not widely used (and that the LaTeX macros are not robust, but that is easily correctable). The only real argument in favor of $...$ is that it shorted to type, but compared to \(...\) or \m{...}, not by much.

The same is true for _ and ^. Both LaTeX and ConTeXt (in fact, so does plain TeX!) provide macros for both of them: \sp for ^ and \sb for _. But don’t panic! I am not asking everyone to start using \sp and \sb. What I am asking is that _ and ^ have normal meaning in text mode. That is, if I type _, I should get _, not a funky error message. In fact, this is not too difficult to achieve. In LaTeX, use the underscore package (it is easy to extend that to take care of ^ as well), and in ConTeXt use \nonknuthmode somewhere in your preamble.

Of course, the next logical step is to make $ a normal letter: that is, if you type $ you get $.

Align character &

Horizontal alignment is one of the strengths of TeX. Most table and multi-line display math environments use horizontal alignment and & specifies the alignment point for horizontal alignment. Surely, getting rid of & will not work.

Unfortunately, that is true in LaTeX. The & character is so critical for horizontal alignment at the user-level that eliminating it will mean a lot of change. Perhaps, & can be handled in the same manner as _ and ^: it can be a regular letter in text mode and have special meaning inside horizontal alignment. But, it is not always clear to the users which macros use horizontal alignment internally. As such, changing the meaning of & inside some environments will bring more trouble than benefits.

However, the situation in ConTeXt is completely different. At the user-level, & is never used to indicate the alignment point. Both tables and multi-line math display use \NC ... \NC ... \NC \NR type of syntax to indicate new columns. In such a situation it is all the more awkward to explain to a user why & is a special character. It should just be made a normal letter. LuaTeX provides a \aligntab primitive which can be used instead in alignment macros.

Parameter indicator #

Macros is what makes TeX different from all other text markup languages. Automatic numbering, cross-references, headers and footers, and all possible due to macros. And #1 is used to indicate the first parameter for the macro, #2 the second, and so on. But, why do we need this special meaning at the user-level? Only the macro writer needs to care about it.

Most LaTeX macros are written in .sty files, that are loaded under a different catcode regime anyways. Most ConTeXt macros are written inside \unprotect ... \protect. So, it is easy to set the traditional catcode regime in both cases. If a user really needs to define macros in the middle of the document, there can be a “programming” environment. For example, ConTeXt provides \starttexcode...\stoptexcode, which sets the same catcodes as \unprotect...\protect. Implementing the same environment in LaTeX is trivial (think `\makeatletter…\makeatother on steroids).

Unbreakable space ~

Knuth used ~ to indicate an unbreakable space, and that tradition has continued ever since. In this age of Unicode text, do we still need such crutches. It is easy enough to type Unicode 0x00AA (non-breakable space) in most editors. For example, in vim I just need to type CTRL+K+<space>+<space>. A smart syntax highlighting scheme will make the non-breakable space visible. So, there is no real reason to keep on using ~ as a non-breakable space. The same argument holds for the TeX macros for accents, typing in Unicode is easy to input and easy to read (but that will be the subject of another rant).

So, what’s the point of all this?

Now image that all these features have been implemented. Then, we may split the introduction to a TeX macro package into two parts: using the macro package and programming the macro package. Split the first part into two further parts: text mode and math mode. For the text mode, the only special characters are \ { } %. All other characters are normal, that means if you type them, you see them in the output (provided the font has the glyph; lets ignore complex languages like Arabic, CJK, and Indic scripts and setting appropriate font features for them at the moment). \ starts a control sequence, {...} groups an argument, and % is a line comment. For the math mode, explain how to enter math mode (\(...\) or \m{...} or the display-math environments) and explain that _ and ^ are used to indicate sub- and super-scripts. Postpone explaining the programming mode for later. I think that such a scheme will lower the cognitive load on the new user.

Will such a system work? Yes, it will. In fact, it already does. For about an year now, ConTeXt has a \asciimode macro that implements all these features, with a slight twist. % is also a normal letter and you need to type %% to get a line comment (and %{}% if you really need the output %%). This macro is not enabled by default. I think that making it default will simplify understanding TeX for the first time. As an added advantage, it will also make the job of sanitizing the input simpler for converters (such as pandoc) that convert some other markup language to TeX.


This entry was posted in Learning and tagged asciimode, catcodes.