cmark — Iterator System
Overview
The iterator system (iterator.c, iterator.h) provides depth-first traversal of the AST using an event-based model. Each node is visited twice: once on CMARK_EVENT_ENTER (before children) and once on CMARK_EVENT_EXIT (after children). Leaf nodes receive both events in immediate succession.
All renderers (HTML, XML, LaTeX, man, CommonMark) use the iterator as their traversal mechanism.
Public API
cmark_iter *cmark_iter_new(cmark_node *root);
void cmark_iter_free(cmark_iter *iter);
cmark_event_type cmark_iter_next(cmark_iter *iter);
cmark_node *cmark_iter_get_node(cmark_iter *iter);
cmark_event_type cmark_iter_get_event_type(cmark_iter *iter);
cmark_node *cmark_iter_get_root(cmark_iter *iter);
void cmark_iter_reset(cmark_iter *iter, cmark_node *current, cmark_event_type event_type);
Iterator State
struct cmark_iter {
cmark_mem *mem;
cmark_node *root;
cmark_node *cur;
cmark_event_type ev_type;
};
The iterator stores:
root— The subtree root (traversal boundary)cur— Current nodeev_type— Current event (CMARK_EVENT_ENTER,CMARK_EVENT_EXIT,CMARK_EVENT_DONE, orCMARK_EVENT_NONE)
Event Types
typedef enum {
CMARK_EVENT_NONE, // Initial state
CMARK_EVENT_DONE, // Traversal complete (exited root)
CMARK_EVENT_ENTER, // Entering a node (pre-children)
CMARK_EVENT_EXIT, // Exiting a node (post-children)
} cmark_event_type;
Leaf Node Detection
static const int S_leaf_mask =
(1 << CMARK_NODE_HTML_BLOCK) | (1 << CMARK_NODE_THEMATIC_BREAK) |
(1 << CMARK_NODE_CODE_BLOCK) | (1 << CMARK_NODE_TEXT) |
(1 << CMARK_NODE_SOFTBREAK) | (1 << CMARK_NODE_LINEBREAK) |
(1 << CMARK_NODE_CODE) | (1 << CMARK_NODE_HTML_INLINE);
static bool S_is_leaf(cmark_node *node) {
return ((1 << node->type) & S_leaf_mask) != 0;
}
Leaf nodes are determined by a bitmask — not by checking whether first_child is NULL. This means an emphasis node with no children is still treated as a container (it receives separate enter and exit events).
The leaf node types are:
- Block leaves:
HTML_BLOCK,THEMATIC_BREAK,CODE_BLOCK - Inline leaves:
TEXT,SOFTBREAK,LINEBREAK,CODE,HTML_INLINE
Traversal Algorithm
cmark_iter_next() implements the state machine:
cmark_event_type cmark_iter_next(cmark_iter *iter) {
cmark_event_type ev_type = iter->ev_type;
cmark_node *cur = iter->cur;
if (ev_type == CMARK_EVENT_DONE) {
return CMARK_EVENT_DONE;
}
// For initial state, start with ENTER on root
if (ev_type == CMARK_EVENT_NONE) {
iter->ev_type = CMARK_EVENT_ENTER;
return iter->ev_type;
}
if (ev_type == CMARK_EVENT_ENTER && !S_is_leaf(cur)) {
// Container node being entered — descend to first child if it exists
if (cur->first_child) {
iter->ev_type = CMARK_EVENT_ENTER;
iter->cur = cur->first_child;
} else {
// Empty container — immediately exit
iter->ev_type = CMARK_EVENT_EXIT;
}
} else if (cur == iter->root) {
// Exiting root (or leaf at root) — done
iter->ev_type = CMARK_EVENT_DONE;
iter->cur = NULL;
} else if (cur->next) {
// Move to next sibling
iter->ev_type = CMARK_EVENT_ENTER;
iter->cur = cur->next;
} else if (cur->parent) {
// No more siblings — exit parent
iter->ev_type = CMARK_EVENT_EXIT;
iter->cur = cur->parent;
} else {
// Orphan node — done
assert(false);
iter->ev_type = CMARK_EVENT_DONE;
iter->cur = NULL;
}
return iter->ev_type;
}
State Transition Summary
| Current State | Condition | Next State |
|---|---|---|
NONE |
(initial) | ENTER(root) |
ENTER(container) |
has children | ENTER(first_child) |
ENTER(container) |
no children | EXIT(container) |
ENTER(leaf) or EXIT(node) |
node == root | DONE |
ENTER(leaf) or EXIT(node) |
has next sibling | ENTER(next) |
ENTER(leaf) or EXIT(node) |
has parent | EXIT(parent) |
DONE |
(terminal) | DONE |
Traversal Order Example
For a document with a paragraph containing "Hello world":
Document
└── Paragraph
├── Text("Hello ")
├── Emph
│ └── Text("world")
└── (end)
Event sequence:
ENTER(Document)ENTER(Paragraph)ENTER(Text "Hello ")— leaf, immediate transitionENTER(Emph)ENTER(Text "world")— leaf, immediate transitionEXIT(Emph)EXIT(Paragraph)EXIT(Document)DONE
Iterator Reset
void cmark_iter_reset(cmark_iter *iter, cmark_node *current,
cmark_event_type event_type) {
iter->cur = current;
iter->ev_type = event_type;
}
Allows repositioning the iterator to any node and event type. This is used by renderers to skip subtrees — e.g., when the HTML renderer processes an image node, it may skip children after extracting alt text.
Text Node Consolidation
void cmark_consolidate_text_nodes(cmark_node *root) {
if (root == NULL) return;
cmark_iter *iter = cmark_iter_new(root);
cmark_strbuf buf = CMARK_BUF_INIT(iter->mem);
cmark_event_type ev_type;
cmark_node *cur, *tmp, *next;
while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
cur = cmark_iter_get_node(iter);
if (ev_type == CMARK_EVENT_ENTER && cur->type == CMARK_NODE_TEXT &&
cur->next && cur->next->type == CMARK_NODE_TEXT) {
// Merge consecutive text nodes
cmark_strbuf_clear(&buf);
cmark_strbuf_put(&buf, cur->data, cur->len);
tmp = cur->next;
while (tmp && tmp->type == CMARK_NODE_TEXT) {
cmark_iter_reset(iter, tmp, CMARK_EVENT_ENTER);
cmark_strbuf_put(&buf, tmp->data, tmp->len);
cur->end_column = tmp->end_column;
next = tmp->next;
cmark_node_free(tmp);
tmp = next;
}
// Replace cur's data with merged content
cmark_chunk_free(iter->mem, &cur->as.literal);
cmark_strbuf_trim(&buf);
// ... set cur->data and cur->len
}
}
cmark_strbuf_free(&buf);
cmark_iter_free(iter);
}
This function merges adjacent text nodes into a single text node. Adjacent text nodes can arise from inline parsing (e.g., when backslash escapes split text). The function:
- Finds consecutive text node runs
- Concatenates their content into a buffer
- Updates the first node's content and end position
- Frees the subsequent nodes
- Uses
cmark_iter_reset()to skip freed nodes
Usage Patterns
Standard Rendering Loop
cmark_iter *iter = cmark_iter_new(root);
while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
cur = cmark_iter_get_node(iter);
S_render_node(cur, ev_type, &state, options);
}
cmark_iter_free(iter);
Skipping Children
To skip rendering of a node's children (e.g., for image alt text in HTML):
if (ev_type == CMARK_EVENT_ENTER) {
cmark_iter_reset(iter, node, CMARK_EVENT_EXIT);
}
This jumps directly to the exit event, bypassing all children.
Safe Node Removal
The iterator handles node removal between calls. Since cmark_iter_next() always follows next and parent pointers from the current position, removing the current node is safe as long as:
- The node's
nextandparentpointers remain valid - The iterator is reset to skip the removed node's children
Thread Safety
Iterators are NOT thread-safe. A single AST must not be iterated concurrently without external synchronization. However, since iterators only read the AST (never modify it), multiple read-only iterators on the same AST are safe if no modifications occur.
Memory
The iterator allocates a cmark_iter struct using the root node's memory allocator:
cmark_iter *cmark_iter_new(cmark_node *root) {
cmark_mem *mem = root->mem;
cmark_iter *iter = (cmark_iter *)mem->calloc(1, sizeof(cmark_iter));
iter->mem = mem;
iter->root = root;
iter->cur = root;
iter->ev_type = CMARK_EVENT_NONE;
return iter;
}
Cross-References
- iterator.c — Iterator implementation
- iterator.h — Iterator struct definition
- ast-node-system.md — The nodes being traversed
- html-renderer.md — Example of iterator-driven rendering
- render-framework.md — Framework that wraps iterator use