LuaTeX as a scripting language

Posted on August 16, 2021

This post is not really about ConTeXt but about how I could use ConTeXt to quickly hash out an idea which involved some text processing.

One of my long running (for more than a decade now) ConTeXt projects is typesetting my CV. I maintain the list of publications as an XML file. I parse the file using ConTeXt’s XML helper’s and convert the data to a Lua table, and then typeset it using ConTeXt Lua Documents. When starting with this project, I chose XML as a data format for two reasons. First, XML scema can be validated using a Schema. Second, I thought that since XML is so popular, there must be good tools for authoring XML documents.

The first reason has really paid off. I wrote a Relax NG schema for the data layout that I had in mind, and I validate the XML files (using jing) as part of by build process. However, I never got down to exploring authoring tools for XML. I using vim (now neovim) for editing everything, which has a decent XML plugin. So, I mostly updated my XML file by hand (often by simply copy-pasting an old entry and changing appropriate values). It works, but is an error prone process.

I recently came across YAD (yet another dialog), which is a program to easily generate GUI dialog boxes and forms. See this page for long list of examples. Using yad, I could easily generate a GUI to enter all the information for a publication.

A GUI using YAD
data=$(yad --title="Add publication" --form \
           --separator="|"  --width=500 \
           --field="Title" "" \
           --field="Author 1" "" --field="Student 1?:CHK" "" \
           --field="Author 2" "" --field="Student 2?:CHK" "" \
           --field="Author 3" "" --field="Student 3?:CHK" "" \
           --field="Author 4" "" --field="Student 4?:CHK" "" \
           --field="Author 5" "" --field="Student 5?:CHK" "" \
           --field="Journal" "" \
           --field="Pages" "" \
           --field="Month" $(date +%b) \
           --field="Year"  $(date +%Y) \
           --field="Note:TXT" "")

After clicking “OK”, the values of all fields are written to STDOUT (delimited by | (set via the option --separator). A typical entry looks as follows.

$ echo $data
Title of the paper|A. Author|TRUE|B. Author|FALSE||FALSE||FALSE||FALSE|Fancy Journal|1–10|Aug|2021||

So, to generate XML entry, all I needed to do was, split the string at | to determine the value of each field and put it in an appropriate place in a template. The only catch is that most of papers do not have five authors. So, I need to make sure that I don’t generate entries for empty authors. Such conditional processing was too complicated for my shell programming or AWK scripts, but is trivial in any proper programming language. The question was, which language?

My usual go to language for such tasks is Ruby. But I normally don’t do much text processing and it has been a while since I wrote any Ruby code. So I found myself googing about a lot of basics of the language: how to split a string, how to use templates, and so on.

As I was googing, I realized that I actually do write a lot of code for text processing … just that I write it as part of my TeX documents–in Lua! So, I could easily write the processing and the formatting code in Lua using the helper functions provided by ConTeXt (for example, for templates). These helper functions are not available in pure Lua, but they can be accessed if the Lua script is called via mtxrun --script. So, I could quickly write out the following:

local replace = utilities.templates.replace

local data = string.explode(environment.arguments.data or nil, "|")

local variables = { }

variables.title = data[1]

local author  = {}
local student = {}
local num_authors = 0
for i=1,5 do
  if data[2*i] == "" then
      break
  else
      num_authors = num_authors + 1
      author[i]  = data[2*i]
      student[i] = data[2*i+1]
  end
end

variables.journal = data[12]
variables.pages   = data[13]
variables.month   = data[14]
variables.year    = data[15]
variables.note    = data[16]

local template = [[
    <publication status="submitted">
      <title>
          %title%
      </title>
      <authors>
%authors%
      </authors>
      <journal>
          %journal%
      </journal>
      <pages>%pages%</pages>
      <month>%month%</month>
      <year>%year%</year>
    </publication>
]]

local template_author_a = [[        <name type="student">%author%</name>]]
local template_author_b = [[        <name>%author%</name>]]

local formatted_authors = { }
for i = 1, num_authors do
  if student[i] == "TRUE" then
formatted_authors[i] = replace(template_author_a, {author=author[i]})
  else
formatted_authors[i] = replace(template_author_b, {author=author[i]})
  end
end

variables.authors = table.concat(formatted_authors, "\n")

local entry = replace(template, variables)
print(entry)

The code in itself is not that interesting. The point that I am trying to make is that since I already do a lot of text processing in ConTeXt-flavored Lua, I can simply reuse that knowledge and quickly do the required text manging to generate the following snippet:

    <publication status="submitted">
      <title>
          Title of the paper
      </title>
      <authors>
        <name type="student">A. Author</name>
        <name>B. Author</name>
      </authors>
      <journal>
          Fancy Journal
      </journal>
      <pages>1--10</pages>
      <month>Aug</month>
      <year>2021</year>
    </publication>

For the record, the complete code was:

data=$(yad ...)
$MTXRUN --script $SCRIPT --data="$data" 

where $MTXRUN is the location of the mtxrun binary and $SCRIPT is the location of my lua script. To complete the circle, I defined the following function in the local .vimrc file my project directory:

command! AddPub read !$HOME/bin/add-publication

So, I can just run :AddPub in vim to call the GUI and after I fill in all the values, the formatted entry is inserted at the current location. This was a fun weekend project!


This entry was posted in CLI and tagged script, mtxrun, yad, programming.