LaTeX Binary Search Tree
19 Jun 2020Recently, I have been exploring the essence of LaTeX. The discovery of LaTeX3 really uncovers the potential of LaTeX as a “generic” programming language. Although with LuaTeX, writing complicated data structure and algorithms in pure TeX seems to be a waste of time, I still find my effort somehow meaningful in a way: at this point, I haven’t really studied programming languages and compilers formally, but playing with TeX really allows me to feel the fine line between text and program. We are able to do amazing things with TeX (which is Turing complete), but accompanied by fact that the foundation of TeX is based on text substitution and some support for numerical evaluation, I am stunned by its simplicity and universality. Text is program, program is text, it is the perfect concord.
Storing nodes in LaTeX
Because there is no stack or heap in LaTeX, we cannot just allocate new nodes in LaTeX just like in C++ or Python. It is important to keep in mind that the foundation of TeX is text substitution. When we define a macro as follows,
\newcommand{\cmd}{value}
what the TeX compiler does is to remember \cmd
and replace it with value
for every occurrence in the document. We can treat this as the equivalence of variable declaration in other languages. Therefore, to store tree nodes in LaTeX, we can use unique macros to store the value of each node. In my implementation, I apply the following rules:
- Each tree has a unique name
- Each tree has a node counter to keep track of the number of nodes in the tree
-
The macro name of a tree node is generated with the following function. The
\int_to_alph:n
function converts an integer value to its corresponding alphabetic sequence. For example,1
isa
,26
isz
,27
isaa
and so on. This guarantees that the macro names only contain valid characters.% get name of the node given tree name (#1) and node id (#2) \cs_new:Npn \bt_node_name:nn #1#2 { __g_bt_#1_node_\int_to_alph:n {#2}_tl }
The macro name for
test
tree’s 27th node will be\__g_bt_test_node_aa_tl
. - The value of each node contains three groups: left node id, right node id and data. The default value for a node is
{-1}{-1}{}
, where-1
indicates empty child id. One can access the value of each group by calling\tl_item:Nn
.
Basic operations
Each tree is specified by a tree name; each node is specified by tree name and integer node id.
\bt_get_node:nnN
: get the node given tree name and node id% get node given tree name and node id % stores the node in #3 \cs_new:Npn \bt_get_node:nnN #1#2#3 { \tl_gset:Nx \__g_bt_tmpa_tl {\bt_node_name:nn {#1} {#2}} \tl_if_exist:cTF {\__g_bt_tmpa_tl} { \tl_gset_eq:Nc #3 {\__g_bt_tmpa_tl} } { \msg_error:nnnn {bt} {nonodeerror} {#1} {#2} } }
Higher level functions:
\bt_get_node_data:nnN
\bt_get_node_left_child_id:nnN
\bt_get_node_right_child_id:nnN
\bt_get_node_left_child:nnNTF
\bt_get_node_right_child:nnNTF
-
\bt_set_node:nnnnn
: set the value of a node% set value of a node % #1: tree name % #2: node id % #3: left node id % #4: right node id % #5: data \cs_new:Npn \bt_set_node:nnnnn #1#2#3#4#5 { \tl_gset:cn { \bt_node_name:nn {#1} {#2} } { {#3}{#4}{#5} } }
Higher level functions:
\bt_set_node_data:nnn
: set the data of a node
-
\bt_new_node:n
: create a new node in a given tree. The new node name is stored in\g_bt_new_node_name_tl
; the new node id is stored in\g_bt_new_node_id_tl
.Higher level functions:
\bt_new_node_left_child:nn
: create a new left child for a node\bt_new_node_right_child:nn
: create a new right child for a node
\bt_new:n
: create a new tree with a given name
Non-recursive insertion and traversal
One of the greatest difficulties in implementing recursive functions in LaTeX is the lack of stack (namespace). That is, the value of a variable will be shared in all depths of the algorithm, which will break the program down. Therefore, non-recursive algorithms with the help of queue structures provided by l3seq
are preferred.
Insertion
% data insertion function
\cs_new:Npn \insert_data:n #1 {
% start with root node id
\tl_gset:Nn \g_tmpa_tl {1}
% initialize loop variable
\bool_gset_true:N \g_tmpa_bool
\bool_do_while:Nn \g_tmpa_bool {
% extract data from current node id
\bt_get_node_data:nnN {test} {\g_tmpa_tl} \l_tmpa_tl
\int_compare:nNnTF {#1} < {\l_tmpa_tl} {
% set next node id
\bt_get_node_left_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
% if next node is empty, insert here
\int_compare:nNnTF {\l_tmpb_tl} = {-1} {
\bt_new_node_left_child:nn {test} {\g_tmpa_tl}
% set data
\bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
% set loop variable
\bool_gset_false:N \g_tmpa_bool
\par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~left~of~node~\tl_use:N \g_tmpa_tl
} {
% otherwise, continue
\tl_gset:NV \g_tmpa_tl \l_tmpb_tl
}
} {
% set next node id
\bt_get_node_right_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
% if next node is empty, insert here
\int_compare:nNnTF {\l_tmpb_tl} = {-1} {
\bt_new_node_right_child:nn {test} {\g_tmpa_tl}
% set data
\bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
% set loop variable
\bool_gset_false:N \g_tmpa_bool
\par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~right~of~node~\tl_use:N \g_tmpa_tl
} {
% otherwise, continue
\tl_gset:NV \g_tmpa_tl \l_tmpb_tl
}
}
}
}
DFS
% because TeX does not have stack, variables of the same name will be
% shared globally. therefore, we need to use the non-recursive version
% of dfs
\cs_new:Npn \bt_dfs:n #1 {
\seq_clear:N \l_tmpa_seq
\seq_push:Nn \l_tmpa_seq {#1}
\bool_do_until:nn {\seq_if_empty_p:N \l_tmpa_seq} {
\seq_pop:NN \l_tmpa_seq \l_tmpa_tl
\int_compare:nNnTF {\l_tmpa_tl} = {-1} {} {
\bt_get_node_data:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\tl_use:N \l_tmpb_tl \space
\bt_get_node_right_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\seq_push:NV \l_tmpa_seq \l_tmpb_tl
\bt_get_node_left_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\seq_push:NV \l_tmpa_seq \l_tmpb_tl
}
}
}
Listings
Python program
This program is replicated in LaTeX.
class Node:
def __init__(self):
self.left = None
self.right = None
self.data = None
root = Node()
root.data = 5
def insert_data(data):
node = root
while True:
if data < node.data:
attr = 'left'
else:
attr = 'right'
if getattr(node, attr) is None:
# create node and exit loop
new_node = Node()
new_node.data = data
setattr(node, attr, new_node)
break
else:
node = getattr(node, attr)
def dfs(node):
if node is None:
return
print(node.data, end=' ')
dfs(node.left)
dfs(node.right)
for i in range(1, 21):
insert_data(i)
dfs(root)
Main TeX document
\documentclass{article}
\usepackage[a3paper, landscape]{geometry}
\usepackage{newtxtext, newtxmath}
\usepackage{amsmath}
\usepackage{datetime2}
\usepackage{expl3}
\begin{document}
\input{binary_tree}
\ExplSyntaxOn
% new tree
\bt_new:n {test}
% set value of root node
\bt_set_node_data:nnn {test} {1} {5}
\tl_new:N \l_tmpc_tl
% data insertion function
\cs_new:Npn \insert_data:n #1 {
% start with root node id
\tl_gset:Nn \g_tmpa_tl {1}
% initialize loop variable
\bool_gset_true:N \g_tmpa_bool
\bool_do_while:Nn \g_tmpa_bool {
% extract data from current node id
\bt_get_node_data:nnN {test} {\g_tmpa_tl} \l_tmpa_tl
\int_compare:nNnTF {#1} < {\l_tmpa_tl} {
% set next node id
\bt_get_node_left_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
% if next node is empty, insert here
\int_compare:nNnTF {\l_tmpb_tl} = {-1} {
\bt_new_node_left_child:nn {test} {\g_tmpa_tl}
% set data
\bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
% set loop variable
\bool_gset_false:N \g_tmpa_bool
\par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~left~of~node~\tl_use:N \g_tmpa_tl
} {
% otherwise, continue
\tl_gset:NV \g_tmpa_tl \l_tmpb_tl
}
} {
% set next node id
\bt_get_node_right_child_id:nnN {test} {\g_tmpa_tl} \l_tmpb_tl
% if next node is empty, insert here
\int_compare:nNnTF {\l_tmpb_tl} = {-1} {
\bt_new_node_right_child:nn {test} {\g_tmpa_tl}
% set data
\bt_set_node_data:nnn {test} {\g_bt_new_node_id_tl} {#1}
% set loop variable
\bool_gset_false:N \g_tmpa_bool
\par inserted~node~\tl_use:N \g_bt_new_node_id_tl \space as~right~of~node~\tl_use:N \g_tmpa_tl
} {
% otherwise, continue
\tl_gset:NV \g_tmpa_tl \l_tmpb_tl
}
}
}
}
\int_new:N \g_dfs_int
\int_gset:Nn \g_dfs_int {0}
% because TeX does not have stack, variables of the same name will be
% shared globally. therefore, we need to use the non-recursive version
% of dfs
\cs_new:Npn \bt_dfs:n #1 {
\seq_clear:N \l_tmpa_seq
\seq_push:Nn \l_tmpa_seq {#1}
\bool_do_until:nn {\seq_if_empty_p:N \l_tmpa_seq} {
\seq_pop:NN \l_tmpa_seq \l_tmpa_tl
\int_compare:nNnTF {\l_tmpa_tl} = {-1} {} {
\bt_get_node_data:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\tl_use:N \l_tmpb_tl \space
\bt_get_node_right_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\seq_push:NV \l_tmpa_seq \l_tmpb_tl
\bt_get_node_left_child_id:nnN {test} {\l_tmpa_tl} \l_tmpb_tl
\seq_push:NV \l_tmpa_seq \l_tmpb_tl
}
}
}
\int_gset:Nn \g_tmpa_int {1}
\int_do_until:nNnn {\g_tmpa_int} > {20} {
\exp_args:Nx \insert_data:n {\int_use:N \g_tmpa_int}
\int_gincr:N \g_tmpa_int
}
\par
\bt_dfs:n {1}
\ExplSyntaxOff
\par\DTMNow
\end{document}
binary_tree.tex
\ExplSyntaxOn
\tl_new:N \g_bt_new_node_name_tl
\tl_new:N \g_bt_new_node_id_tl
\tl_new:N \__g_bt_tmpa_tl
\tl_new:N \__g_bt_tmpb_tl
% get name of the node given tree name and node id
\cs_new:Npn \bt_node_name:nn #1#2 {
__g_bt_#1_node_\int_to_alph:n {#2}_tl
}
% get the name of node counter
\cs_new:Npn \bt_counter_name:n #1 {
__g_bt_#1_counter_int
}
% get counter value for a tree
\cs_new:Npn \bt_get_counter:n #1 {
\int_use:c {\bt_counter_name:n {#1}}
}
% set value of a node
% #1: tree name
% #2: node id
% #3: left node id
% #4: right node id
% #5: data
\cs_new:Npn \bt_set_node:nnnnn #1#2#3#4#5 {
\tl_gset:cn { \bt_node_name:nn {#1} {#2} } { {#3}{#4}{#5} }
}
\cs_generate_variant:Nn \bt_set_node:nnnnn {nnxxx}
\msg_new:nnn {bt} {nonodeerror} {node\ #2\ of\ tree\ "#1"\ does\ not\ exist.}
\msg_new:nnn {bt} {haschilderror} {node\ #2\ of\ tree\ "#1"\ already\ has\ a \ #3\ child.}
% create a new node and put the node name in \g_bt_new_node_tl
\cs_new:Npn \bt_new_node:n #1 {
% increment counter value
\int_gincr:c {\bt_counter_name:n {#1}}
\tl_gset:Nx \g_bt_new_node_id_tl {\bt_get_counter:n {#1}}
%\par idd~ \cs_meaning:N \g_bt_new_node_id_tl
% set new node name
\tl_gset:Nx \g_bt_new_node_name_tl {
\bt_node_name:nn {#1} {\g_bt_new_node_id_tl}
}
%\par node~name:~\cs_meaning:N \g_bt_new_node_name_tl~end
% declare new node
\tl_new:c {\g_bt_new_node_name_tl}
% set initial value for the node
\bt_set_node:nnnnn {#1} {\g_bt_new_node_id_tl} {-1} {-1} {}
%\par \cs_meaning:c \g_bt_new_node_name_tl
}
% get node given tree name and node id
% stores the node in #3
\cs_new:Npn \bt_get_node:nnN #1#2#3 {
\tl_gset:Nx \__g_bt_tmpa_tl {\bt_node_name:nn {#1} {#2}}
\tl_if_exist:cTF {\__g_bt_tmpa_tl} {
\tl_gset_eq:Nc #3 {\__g_bt_tmpa_tl}
} {
\msg_error:nnnn {bt} {nonodeerror} {#1} {#2}
}
}
\cs_new:Npn \bt_set_node_data:nnn #1#2#3 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\bt_set_node:nnxxx {#1} {#2} {\tl_item:Nn \__g_bt_tmpa_tl {1}} {\tl_item:Nn \__g_bt_tmpa_tl {2}} {#3}
}
% get node id of left child, saves to #3
\cs_new:Npn \bt_get_node_left_child_id:nnN #1#2#3 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {1}}
}
% get node id of right child, saves to #3
\cs_new:Npn \bt_get_node_right_child_id:nnN #1#2#3 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {2}}
}
% get left child, saves to #3
\cs_new:Npn \bt_get_node_left_child:nnNTF #1#2#3#4#5 {
\bt_get_node_left_child_id:nnN {#1} {#2} \__g_bt_tmpa_tl
\int_compare:nNnTF {\__g_bt_tmpa_tl} = {-1} {
#5
} {
\bt_get_node:nnN {#1} {\__g_bt_tmpa_tl} #3
#4
}
}
% get right child, saves to #3
\cs_new:Npn \bt_get_node_right_child:nnNTF #1#2#3#4#5 {
\bt_get_node_right_child_id:nnN {#1} {#2} \__g_bt_tmpa_tl
\int_compare:nNnTF {\__g_bt_tmpa_tl} = {-1} {
#5
} {
\bt_get_node:nnN {#1} {\__g_bt_tmpa_tl} #3
#4
}
}
\cs_new:Npn \bt_get_node_data:nnN #1#2#3 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\tl_gset:Nx #3 {\tl_item:Nn \__g_bt_tmpa_tl {3}}
}
\cs_new:Npn \bt_new_node_left_child:nn #1#2 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\tl_gset:Nx \__g_bt_tmpb_tl {\tl_item:Nn \__g_bt_tmpa_tl {1}}
\int_compare:nNnTF {\__g_bt_tmpb_tl} = {-1} {
% create new node
\bt_new_node:n {#1}
%\par id~ \cs_meaning:N \g_bt_new_node_id_tl
\bt_set_node:nnxxx {#1} {#2} {\g_bt_new_node_id_tl} {\tl_item:Nn \__g_bt_tmpa_tl {2}} {\tl_item:Nn \__g_bt_tmpa_tl {3}}
} {
\msg_error:nnnnn {bt} {haschilderror} {#1} {#2} {left}
}
}
\cs_new:Npn \bt_new_node_right_child:nn #1#2 {
\bt_get_node:nnN {#1} {#2} \__g_bt_tmpa_tl
\tl_gset:Nx \__g_bt_tmpb_tl {\tl_item:Nn \__g_bt_tmpa_tl {2}}
\int_compare:nNnTF {\__g_bt_tmpb_tl} = {-1} {
% create new node
\bt_new_node:n {#1}
\bt_set_node:nnxxx {#1} {#2} {\tl_item:Nn \__g_bt_tmpa_tl {1}} {\g_bt_new_node_id_tl}
{\tl_item:Nn \__g_bt_tmpa_tl {3}}
} {
\msg_error:nnnnn {bt} {haschilderror} {#1} {#2} {right}
}
}
% create new binary tree with a name
\cs_new:Npn \bt_new:n #1 {
% create counter
\int_new:c {\bt_counter_name:n {#1}}
% set counter
\int_gset:cn {\bt_counter_name:n {#1}} {0}
% create a root node
\bt_new_node:n {#1}
}
\ExplSyntaxOff
Python output
PS C:\Users\Alan\Desktop\test_tex> python .\bt.py 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
LaTeX output
inserted node 2 as left of node 1 inserted node 3 as right of node 2 inserted node 4 as right of node 3 inserted node 5 as right of node 4 inserted node 6 as right of node 1 inserted node 7 as right of node 6 inserted node 8 as right of node 7 inserted node 9 as right of node 8 inserted node 10 as right of node 9 inserted node 11 as right of node 10 inserted node 12 as right of node 11 inserted node 13 as right of node 12 inserted node 14 as right of node 13 inserted node 15 as right of node 14 inserted node 16 as right of node 15 inserted node 17 as right of node 16 inserted node 18 as right of node 17 inserted node 19 as right of node 18 inserted node 20 as right of node 19 inserted node 21 as right of node 20 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2020-06-19 18:29:44-04:00