From LaTeX to Jekyll markdown: towards faster post compilation
09 Oct 2019Recently, I updated my texlive distribution to 2019. I think I am experiencing a huge boost in compilation speed, though I have not yet done some serious comparison. As I start to take notes with Jekyll in this semester, a problem gradually stands out: speed. Writing with Jekyll are slow in two ways. Firstly, its compilation speed is slow. Secondly, it does not provide any navigation functionality. Therefore, if I am able to write my posts in \(\LaTeX\) and convert them into Jekyll markdown afterwards, I can potentially save a lot of time.
Methodology
Just like how I used the HTMLParser class of Python to convert HTML into \(\LaTeX\), I can also use a \(\LaTeX\) parser to achieve this conversion. TexSoup is a wonderful \(\LaTeX\) parser in Python that makes everything much easier. The major difference between \(\LaTeX\) and Jekyll markdown are in the following ways:
- math mode: The delimiters of math environment are different. Because I am using MathJax, the math mode is triggered with “$$” instead of “$”.
- headers: In markdown, headers are preceded with “#”. There is no equivalence in \(\LaTeX\). Also, the pound sign is a special symbol \(\LaTeX\), which needs to be escaped. in In order to solve this problem, I predefined a \(\LaTeX\) command
\header
so that the header style can be reconstructed later. - lists: In \(\LaTeX\), ordered and unordered lists are represented by
enumerate
anditemize
envionments, respectively. They will be subsequently converted into corrresponding HTML tags for maximum compatibility. - bold and italic text: \(\LaTeX\) marks bold and italic text with
\textbf
and\textit
commands. They need to be converted into their markdown equivalents. - hyperlinks: In \(\LaTeX\), hyperlinks are created using the
\href
command, which is different from markdown’s syntax. - other features: Since I also implement some other special syntax in my Jekyll, they need to be implemented in \(\LaTeX\) as well. Generally, this can be solved by adding commands in \(\LaTeX\).
The sample LaTeX input
A sample LaTeX input for this code is shown as below.
\documentclass[12pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{setspace}
\usepackage[T1]{fontenc}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{bm}
\usepackage{mathptmx}
\usepackage{xcolor}
\usepackage{hyperref}
\newcommand{\header}[2]{{\vspace*{1em}\par\noindent\Large #2}} % dummy command for specifying header in html
\newcommand{\toind}[1]{\textcolor{red}{#1}}
\onehalfspacing
\begin{document}
\header{2}{Header 1}
\header{3}{Header 1.1}
\begin{itemize}
\item \textbf{Hello}
\begin{align*}
S=\pi r^2
\end{align*}
\item \textit{World} $E=mc^2$
\end{itemize}
\href{https://www.alanshawn.com}{A link}
\end{document}
It generates the following markdown output:
## Header 1
### Header 1.1
<ul markdown="1">
<li markdown="1">**Hello**
$$
\begin{align*}
S=\pi r^2
\end{align*}
$$
</li>
<li markdown="1">*World* $$E=mc^2$$
</li>
</ul>
<a href="https://www.alanshawn.com">A link</a>
Pasted into this page, we have the content below:
Header 1
Header 1.1
-
Hello
\[\begin{align*} S=\pi r^2 \end{align*}\] -
World \(E=mc^2\)