cmark — LaTeX Renderer
Overview
The LaTeX renderer (latex.c) converts a cmark_node AST into LaTeX source, suitable for compilation with pdflatex, xelatex, or lualatex. It uses the generic render framework from render.c, operating through a per-character output callback (outc) and a per-node render callback (S_render_node).
Entry Point
char *cmark_render_latex(cmark_node *root, int options, int width);
root— AST root nodeoptions— Option flags (CMARK_OPT_SOURCEPOS,CMARK_OPT_HARDBREAKS,CMARK_OPT_NOBREAKS,CMARK_OPT_UNSAFE)width— Target line width for hard-wrapping; 0 disables wrapping
Character Escaping (outc)
The outc function handles per-character output decisions. It is the most complex part of the LaTeX renderer, with different behavior for three escaping modes:
static void outc(cmark_renderer *renderer, cmark_escaping escape,
int32_t c, unsigned char nextc);
LITERAL Mode
Pass-through: all characters are output unchanged.
NORMAL Mode
Extensive special-character handling:
| Character | LaTeX Output | Purpose |
|---|---|---|
$ |
\$ |
Math mode delimiter |
% |
\% |
Comment character |
& |
\& |
Table column separator |
_ |
\_ |
Subscript operator |
# |
\# |
Parameter reference |
^ |
\^{} |
Superscript operator |
{ |
\{ |
Group open |
} |
\} |
Group close |
~ |
\textasciitilde{} |
Non-breaking space |
[ |
{[} |
Optional argument bracket |
] |
{]} |
Optional argument bracket |
\ |
\textbackslash{} |
Escape character |
| |
\textbar{} |
Pipe |
' |
\textquotesingle{} |
Straight single quote |
" |
\textquotedbl{} |
Straight double quote |
` | \textasciigrave{} |
Backtick | |
\xA0 (NBSP) |
~ |
LaTeX non-breaking space |
\x2014 (—) |
--- |
Em dash |
\x2013 (–) |
-- |
En dash |
\x2018 (') |
` |
Left single quote |
\x2019 (') |
' |
Right single quote |
\x201C (") |
`` |
Left double quote |
\x201D (") |
'' |
Right double quote |
URL Mode
Only these characters are escaped:
$→\$%→\%&→\&_→\_#→\#{→\{}→\}
All other characters pass through unchanged.
Link Type Classification
The renderer classifies links into five categories:
typedef enum {
NO_LINK,
URL_AUTOLINK,
EMAIL_AUTOLINK,
NORMAL_LINK,
INTERNAL_LINK,
} link_type;
get_link_type()
static link_type get_link_type(cmark_node *node) {
// 1. "mailto:" links where text matches url
// 2. "http[s]:" links where text matches url (with or without protocol)
// 3. Links starting with '#' → INTERNAL_LINK
// 4. Everything else → NORMAL_LINK
}
Detection logic:
- URL_AUTOLINK: The
urlstarts withhttp://orhttps://, the link has exactly one text child, and that child's content matches the URL (or matches the URL minus the protocol prefix). - EMAIL_AUTOLINK: The
urlstarts withmailto:, the link has exactly one text child, and that child's content matches the URL aftermailto:. - INTERNAL_LINK: The
urlstarts with#. - NORMAL_LINK: Everything else.
Enumeration Level
For nested ordered lists, the renderer selects the appropriate LaTeX counter style:
static int S_get_enumlevel(cmark_node *node) {
int enumlevel = 0;
cmark_node *tmp = node;
while (tmp) {
if (tmp->type == CMARK_NODE_LIST &&
cmark_node_get_list_type(tmp) == CMARK_ORDERED_LIST) {
enumlevel++;
}
tmp = tmp->parent;
}
return enumlevel;
}
This walks up the tree, counting ordered list ancestors. LaTeX ordered lists cycle through: enumi (arabic), enumii (alpha), enumiii (roman), enumiv (Alpha).
Node Rendering (S_render_node)
Block Nodes
Document
No output.
Block Quote
ENTER: \begin{quote}\n
EXIT: \end{quote}\n
List
ENTER (bullet): \begin{itemize}\n
ENTER (ordered): \begin{enumerate}\n
\def\labelenumI{COUNTER}\n (if start != 1)
\setcounter{enumI}{START-1}\n
EXIT: \end{itemize}\n or \end{enumerate}\n
The counter is formatted based on enumeration level:
- Level 1:
\arabic{enumi}. - Level 2:
\alph{enumii}.(surrounded by() - Level 3:
\roman{enumiii}. - Level 4:
\Alph{enumiv}.
Period delimiters use ., parenthesis delimiters use ).
Item
ENTER: \item{} (empty braces prevent ligatures with following content)
EXIT: \n
Heading
ENTER: \section{ or \subsection{ or \subsubsection{ or \paragraph{ or \subparagraph{
EXIT: }\n
Mapping: level 1 → \section, level 2 → \subsection, level 3 → \subsubsection, level 4 → \paragraph, level 5 → \subparagraph.
Code Block
\begin{verbatim}
LITERAL CONTENT
\end{verbatim}
The content is output in LITERAL escape mode (no character escaping). Info strings are ignored.
HTML Block
ENTER: % raw HTML omitted\n (as a LaTeX comment)
Raw HTML is always omitted in LaTeX output, regardless of CMARK_OPT_UNSAFE.
Thematic Break
\begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}\n
Paragraph
Same tight-list check as the HTML renderer:
parent = cmark_node_parent(node);
grandparent = cmark_node_parent(parent);
tight = (grandparent && grandparent->type == CMARK_NODE_LIST) ?
grandparent->as.list.tight : false;
- Normal: newline before and after
- Tight: no leading/trailing blank lines
Inline Nodes
Text
Output with NORMAL escaping.
Soft Break
Depends on options:
CMARK_OPT_HARDBREAKS:\\\\\nCMARK_OPT_NOBREAKS: space- Default: newline
Line Break
\\\\\n
Code (inline)
\texttt{ESCAPED CONTENT}
Special handling: Code content is output character-by-character with inline-code escaping. Special characters (\, {, }, $, %, &, _, #, ^, ~) are escaped.
Emphasis
ENTER: \emph{
EXIT: }
Strong
ENTER: \textbf{
EXIT: }
Link
Rendering depends on link type:
NORMAL_LINK:
ENTER: \href{URL}{
EXIT: }
URL_AUTOLINK:
ENTER: \url{URL}
(children are skipped — no EXIT rendering needed)
EMAIL_AUTOLINK:
ENTER: \href{URL}{\nolinkurl{
EXIT: }}
INTERNAL_LINK:
ENTER: (nothing — rendered as plain text)
EXIT: (~\ref{LABEL})
Where LABEL is the URL with the leading # stripped.
NO_LINK: No output.
Image
ENTER: \protect\includegraphics{URL}
Image children (alt text) are skipped. If CMARK_OPT_UNSAFE is not set and the URL matches _scan_dangerous_url(), the URL is omitted.
HTML Inline
% raw HTML omitted
Always omitted, regardless of CMARK_OPT_UNSAFE.
Source Position Comments
When CMARK_OPT_SOURCEPOS is set, the renderer adds LaTeX comments before block elements:
snprintf(buffer, BUFFER_SIZE, "%% %d:%d-%d:%d\n",
cmark_node_get_start_line(node), cmark_node_get_start_column(node),
cmark_node_get_end_line(node), cmark_node_get_end_column(node));
Example Output
Markdown input:
# Hello World
A paragraph with *emphasis* and **bold**.
- Item 1
- Item 2
LaTeX output:
\section{Hello World}
A paragraph with \emph{emphasis} and \textbf{bold}.
\begin{itemize}
\item{}Item 1
\item{}Item 2
\end{itemize}
Cross-References
- latex.c — Full implementation
- render-framework.md — Generic render framework (
cmark_render(),cmark_renderer) - public-api.md —
cmark_render_latex()API docs - html-renderer.md — Contrast with direct buffer renderer