Reveal Thyself! Visualizing LuaTeX Node Structure
14 Dec 2020LuaTeX is a wonderful extension to the traditional LaTeX family (e.g. pdfTeX, XeLaTeX), for it provides a powerful programming back-end for LaTeX. With the help of Lua, it is easier to handle file system concepts, carry out numerical computations, apply simple data processing, handle verbatim contents and so on. LuaTeX is the dream tool for automatic high-quality PDF document generation.
The TeX compiler stores the document structure in terms of nodes, which are kept around with linked lists that contain data and metadata about each character, paragraph and page. With LuaTeX, these internal data structures become visible to TeX users for the first time, which means it is possible to view or even manipulate a document by accessing the internals of a TeX compiler. This feature sounds very promising. Sadly, the documentations about LuaTeX is scarce. The most comprehensive document about LuaTeX is its manual, which spends no more than few paragraphs, if not few sentences for most of its functionalities. With this manual alone, it is very difficult to understand how LuaTeX nodes work thoroughly. In this post, I am going to provide a visualization of LuaTeX’s node structure that revolutionizes all existing presentations of LuaTeX nodes, which are usually faulty and text-based.
GitHub: https://github.com/xziyue/luatex-node-inspect
LuaTeX nodes are “black boxes”
Possibly as a design choice to facilitate the speed of LuaTeX, nodes are implemented as userdata objects, instead of Lua tables that most of us are familiar with. These objects resemble C/C++ extension modules of Python, but they are more opaque compared to their Python counterparts. In Python, a C/C++ extension module would at least contain __dir__
or __doc__
attributes to indicate its members or provide documentation. However, these information are not avaliable in userdata objects. In Lua, one only needs to implement an __index
method for these extension modules, which returns the corresponding attribute value when given the name as a string. Since this function is written in C and compiled to binary form, there is no way for users to find out what members a userdata object contains. This is exactly the case for LuaTeX nodes: given a node object, there is no easy way to tell what data it is holding.
Unboxing LuaTeX nodes
In order to understand node objects, we must find a way to get the members of them. Although it is impossible to do this directly in Lua, these information are tabulated in the LuaTeX manual. Therefore, I manually entered the members of all LuaTeX node types in luatex_node_inspect.py. These information are passed to Lua with a json
file: luatex-node.json. With these in hand, the Lua script luatex-node-to-table.lua can convert LuaTeX nodes from userdata to Lua table in a recursive manner.
Generating Visualization
The Lua script can generate a Lua table representation of LuaTeX nodes. With inspect.lua library, we can already gain deeper insights to a TeX document. Nevertheless, text-based representation is still too long to interpret. For example, the inspect
string for a blank document is shown as below. It is very difficult for one to understand the content and structure of a Lua document. As a result, I attempt to save the Lua table as json
file and use Python’s graphviz
package to form a graphical representation of the node structure. The graphing code is generate_vis.py.
{ attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 0, glue_set = 0.0, glue_sign = 0, head = { { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 565 < 673 > nil : vlist 0>" }, prev = {}, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "userskip(0)", width = "16.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 0, glue_set = 0.0, glue_sign = 0, head = { { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 2, glue_set = 12.0, glue_sign = 1, head = { { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 586 < 600 > nil : hlist 2>" }, prev = {}, shrink = 0, shrink_over = {}, stretch = 65536, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 0, glue_set = 0.0, glue_sign = "normal(0)", head = {}, height = "0.000000pt", id = "hlist(0)", list = {}, next = {}, prev = { "<node nil < 586 > 600 : glue 0>" }, shift = 0, subtype = "box(2)", width = "345.000000pt" } }, height = "12.000000pt", id = "vlist(1)", next = { "<node 618 < 579 > 627 : glue 0>" }, prev = {}, shift = 0, subtype = "unknown(0)", width = "345.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 579 < 627 > 354 : glue 1>" }, prev = { "<node nil < 618 > 579 : vlist 0>" }, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "userskip(0)", width = "25.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 627 < 354 > 650 : vlist 0>" }, prev = { "<node 618 < 579 > 627 : glue 0>" }, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "lineskip(1)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 2, glue_set = 539.94232177734, glue_sign = 1, head = { { attr = { ["0"] = 0, id = "attribute_list(40)" }, data = "", id = "whatsit(8)", next = { "<node 80 < 405 > 345 : glue 10>" }, prev = {}, stream = 129, subtype = "write(1)" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 405 < 345 > 500 : hlist 2>" }, prev = { "<node nil < 80 > 405 : whatsit 1>" }, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "topskip(10)", width = "10.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 0, glue_set = 0.0, glue_sign = "normal(0)", head = {}, height = "0.000000pt", id = "hlist(0)", list = {}, next = { "<node 345 < 500 > 398 : glue 0>" }, prev = { "<node 80 < 405 > 345 : glue 10>" }, shift = 0, subtype = "box(2)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 500 < 398 > 493 : glue 0>" }, prev = { "<node 405 < 345 > 500 : hlist 2>" }, shrink = 0, shrink_over = {}, stretch = 65536, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 398 < 493 > nil : glue 0>" }, prev = { "<node 345 < 500 > 398 : glue 0>" }, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = {}, prev = { "<node 500 < 398 > 493 : glue 0>" }, shrink = 0, shrink_over = {}, stretch = 7, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" } }, height = "550.000000pt", id = "vlist(1)", next = { "<node 354 < 650 > 664 : glue 2>" }, prev = { "<node 579 < 627 > 354 : glue 1>" }, shift = 0, subtype = "unknown(0)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 650 < 664 > nil : hlist 2>" }, prev = { "<node 627 < 354 > 650 : vlist 0>" }, shrink = 0, shrink_over = {}, stretch = 0, stretch_over = {}, subtype = "baselineskip(2)", width = "22.578125pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, depth = "0.000000pt", dir = "TLT", glue_order = 2, glue_set = 169.31884765625, glue_sign = "stretching(1)", head = { { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = { "<node 643 < 636 > 657 : glyph 256>" }, prev = {}, shrink = 0, shrink_over = {}, stretch = 65536, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, char = "'1'(49)", components = {}, data = 0, depth = "0.000000pt", expansion_factor = 0, font = 37, height = "7.421875pt", id = "glyph(29)", lang = 0, left = 2, next = { "<node 636 < 657 > nil : glue 0>" }, prev = { "<node nil < 643 > 636 : glue 0>" }, right = 3, subtype = "*unknown*(256)", uchyph = 1, width = "6.362305pt", xoffset = 0, yoffset = 0 }, { attr = { ["0"] = 0, id = "attribute_list(40)" }, id = "glue(12)", leader = {}, next = {}, prev = { "<node 643 < 636 > 657 : glyph 256>" }, shrink = 0, shrink_over = {}, stretch = 65536, stretch_over = {}, subtype = "userskip(0)", width = "0.000000pt" } }, height = "7.421875pt", id = "hlist(0)", next = {}, prev = { "<node 354 < 650 > 664 : glue 2>" }, shift = 0, subtype = "box(2)", width = "345.000000pt" } }, height = "617.000000pt", id = "vlist(1)", next = {}, prev = { "<node nil < 565 > 673 : glue 0>" }, shift = 4063232, subtype = "unknown(0)", width = "345.000000pt" } }, height = "633.000000pt", id = "vlist(1)", next = {}, prev = {}, shift = 0, subtype = "unknown(0)", width = "407.000000pt" }
Visualizing TeX documents’ internal representation
Throughout this section, I will be using the document template below, where contents are inserted between \begin{document}
and \end{document}
blocks. The \AtBeginShipout
is called for each page. Therefore, the current implementation only preserves the last page of the document. Modifications need to be made if one wishes to visualize multi-page documents.
\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{atbegshi}
\setmainfont{DejaVu Serif}
\begin{luacode*}
require "luatex-node-to-table"
inspect = require"inspect"
local my_param = get_default_param()
my_param["expand_depth"] = 0
function recursive_expand_node(n)
local tbl = luatex_node_to_table(n, my_param)
local head = n.head
if head ~= nil then
local lst_tbl = {}
tbl["list"] = nil
local item = nil
for n1 in node.traverse(head) do
item = recursive_expand_node(n1)
table.insert(lst_tbl, item)
end
tbl["head"] = lst_tbl
end
return tbl
end
\end{luacode*}
\AtBeginShipout{%
\directlua{
local n = tex.box["AtBeginShipoutBox"]
local all_n = recursive_expand_node(n)
local inspect_text = inspect(all_n)
texio.write_nl(inspect_text)
local json_text = json.encode(all_n)
local file = io.open("temp.json", "w")
file:write(json_text)
file:close()
}
}
\begin{document}
\end{document}
Blank document
- Inserted content:
\null
Docuemnt with text “abc”
- Inserted content:
abc
Document with a picture
- Inserted content:
\includegraphics{example-image-a}
Document with colored text
- Inserted content:
A\textcolor{blue}{B}
Document with math
- Inserted content:
\[\int_a^b\frac{1}{x}\]
Document with hyperlinks
- Need to use
hyperref
package in preamble - Inserted content:
\hypertarget{link}{a}\hyperlink{link}{b}
Document with rich media
- Need to use
media9
package in preamble - Inserted content:
\includemedia[ width=0.6\linewidth,height=0.3375\linewidth, % 16:9 activate=pageopen, flashvars={ modestbranding=1 % no YT logo in control bar &autohide=1 % controlbar autohide &showinfo=0 % no title and other info before start &rel=0 % no related videos after end } ]{}{http://www.youtube.com/v/r382kfkqAF4?rel=0}