Syntax Highlighting engines: Clean TeX output

Posted on June 6, 2011

The vim module uses the vim editor to syntax highlight code snippets in ConTeXt. I thought that it should be straight forward to support other syntax highlighting engines: source-highlight, pygments, HsColor, etc. Unfortunately, that is not the case. None of these syntax highlighting engines were written with reuse in mind.

For example, consider a simple tex file:

\definestartstop[important]
                [color=red,
                 style=\italic]
\starttext
This is an \important{important} text
\stoptext

Let’s compare the tex file generated by various syntax highlighters:

Source Highlight

source-highlight -f latex gives

% Generator: GNU source-highlight, by Lorenzo Bettini, http://www.gnu.org/software/src-highlite
\noindent
\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}definestartstop}}\textcolor{Purple}{[important]} \\

\mbox{}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ [color=red, \\

\mbox{}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ style=\textbf{\textcolor{Blue}{\textbackslash{}italic}}] \\

\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}starttext}} \\

\mbox{}This\ is\ an\ \textbf{\textcolor{Blue}{\textbackslash{}important}}\textcolor{ForestGreen}{\{important\}}\ text \\

\mbox{}\textbf{\textcolor{Blue}{\textbackslash{}stoptext}} \\

\mbox{}

Pygmentize

pygmentize -f latex gives

\begin{Verbatim}[commandchars=\\
\{\}]
\PY{k}{\PYZbs{}definestartstop}\PY{n+na}{[important]}
                [color=red,
                 style=\PY{k}{\PYZbs{}italic}]
\PY{k}{\PYZbs{}starttext}
This is an \PY{k}{\PYZbs{}important}\PY{n+nb}{\PYZob{}}important\PY{n+nb}{\PYZcb{}} text
\PY{k}{\PYZbs{}stoptext}
\end{Verbatim}

HsColor

HsColor-latex -partial gives

\textcolor{red}{$\backslash$}{\rm{}definestartstop}\textcolor{red}{[}{\rm{}important}\textcolor{red}{]}\\
\hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \textcolor{red}{[}{\rm{}color}\textcolor{red}{=}{\rm{}red}\textcolor{cyan}{,}\\
\hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace \hsspace {\rm{}style}\textcolor{cyan}{=$\backslash$}{\rm{}italic}\textcolor{red}{]}\\
\textcolor{red}{$\backslash$}{\rm{}starttext}\\
{\rm{}This}\hsspace {\rm{}is}\hsspace {\rm{}an}\hsspace \textcolor{red}{$\backslash$}{\rm{}important}\textcolor{cyan}{\{}{\rm{}important}\textcolor{cyan}{\}}\hsspace {\rm{}text}\\
\textcolor{red}{$\backslash$}{\rm{}stoptext}\\

HsColor and source-highlight use explicit LaTeX commands for spacing and formatting. Ouch! pygmentize uses logical markup, but with cryptic command names. But, from the point of view of using pygmentize output in ConTeXt, the \begin{Verbatim} and \end{Verbatim} are show stopper. (OK, not really. It can be bypassed with some effort).


Based on this experience, I decided to clean up the output generated by 2context.vim:

\SYN[Identifier]{\\definestartstop}[important]
                [color=red,
                 style=\SYN[Identifier]{\\italic}]
\SYN[Statement]{\\starttext}
This is an \SYN[Identifier]{\\important}\{important\} text
\SYN[Statement]{\\stoptext}

I assume only four TeX commands to be defined: \, \{, and \} for backslash, open brace, and close brace; and \SYN[...]{...} for syntax highlighting. Thus, if anyone wants to reuse 2context.vim in plain TeX or LaTeX, or a yet to be written future macro package, they would not need to modify the output at all. I wish the other syntax highlighting programs did the same.


This entry was posted in T-Vim and tagged code formatting.