cmark — Code Style and Conventions

Overview

This document describes the coding conventions and patterns used throughout the cmark codebase. Understanding these conventions makes the source code easier to navigate.

Naming Conventions

Public API Functions

All public functions use the cmark_ prefix:

cmark_node *cmark_node_new(cmark_node_type type);
cmark_parser *cmark_parser_new(int options);
char *cmark_render_html(cmark_node *root, int options);

Internal (Static) Functions

File-local static functions use the S_ prefix:

static void S_render_node(cmark_node *node, cmark_event_type ev_type,
                          struct render_state *state, int options);
static cmark_node *S_node_new(cmark_node_type type, cmark_mem *mem);
static void S_free_nodes(cmark_node *e);
static bool S_is_leaf(cmark_node *node);
static int S_get_enumlevel(cmark_node *node);

This convention makes it immediately clear whether a function has file-local scope.

Internal (Non-Static) Functions

Functions that are internal to the library but shared across translation units use:

  • cmark_ prefix (same as public) — declared in private headers (e.g., parser.h, node.h)
  • No S_ prefix

Examples:

// In node.h (private header):
void cmark_node_set_type(cmark_node *node, cmark_node_type type);
cmark_node *make_block(cmark_mem *mem, cmark_node_type type,
                       int start_line, int start_column);

Struct Members

No prefix convention — struct members use plain names:

struct cmark_node {
  cmark_mem *mem;
  cmark_node *next;
  cmark_node *prev;
  cmark_node *parent;
  cmark_node *first_child;
  cmark_node *last_child;
  // ...
};

Type Names

Typedefs use the cmark_ prefix:

typedef struct cmark_node cmark_node;
typedef struct cmark_parser cmark_parser;
typedef struct cmark_iter cmark_iter;
typedef int32_t bufsize_t;      // Exception: no cmark_ prefix

Enum Values

Enum constants use the CMARK_ prefix with UPPER_CASE:

typedef enum {
  CMARK_NODE_NONE,
  CMARK_NODE_DOCUMENT,
  CMARK_NODE_BLOCK_QUOTE,
  // ...
} cmark_node_type;

Preprocessor Macros

Macros use UPPERCASE, sometimes with `CMARK` prefix:

#define CMARK_OPT_SOURCEPOS   (1 << 1)
#define CMARK_BUF_INIT(mem)   { mem, cmark_strbuf__initbuf, 0, 0 }
#define MAX_LINK_LABEL_LENGTH 999
#define CODE_INDENT           4

Error Handling Patterns

Allocation Failure

The default allocator (xcalloc, xrealloc) aborts on failure:

static void *xcalloc(size_t nmemb, size_t size) {
  void *ptr = calloc(nmemb, size);
  if (!ptr) abort();
  return ptr;
}

Functions that allocate never return NULL — they either succeed or terminate. This eliminates NULL-check boilerplate throughout the codebase.

Invalid Input

Functions that receive invalid arguments typically:

  1. Return 0/false/NULL for queries
  2. Do nothing for mutations
  3. Never crash

Example from node.c:

int cmark_node_set_heading_level(cmark_node *node, int level) {
  if (node == NULL || node->type != CMARK_NODE_HEADING) return 0;
  if (level < 1 || level > 6) return 0;
  node->as.heading.level = level;
  return 1;
}

Return Conventions

  • 0/1 for success/failure: Setter functions return 1 on success, 0 on failure
  • NULL for not found: Lookup functions return NULL when the item doesn't exist
  • Assertion for invariants: Internal invariants use assert():
    assert(googled_node->type == CMARK_NODE_DOCUMENT);

Header Guard Style

#ifndef CMARK_NODE_H
#define CMARK_NODE_H
// ...
#endif

Guards use CMARK_ prefix + uppercase filename + _H.

Include Patterns

Public Headers

#include "cmark.h"  // Always first — provides all public types

Private Headers

#include "node.h"      // Internal node definitions
#include "parser.h"    // Parser internals
#include "buffer.h"    // cmark_strbuf
#include "chunk.h"     // cmark_chunk
#include "references.h" // Reference map
#include "utf8.h"      // UTF-8 utilities
#include "scanners.h"  // re2c-generated scanners

System Headers

#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <stdio.h>

Inline Functions

The CMARK_INLINE macro abstracts compiler-specific inline syntax:

#ifdef _MSC_VER
#define CMARK_INLINE __forceinline
#else
#define CMARK_INLINE __inline__
#endif

Used for small, hot-path functions in headers:

static CMARK_INLINE void cmark_chunk_free(cmark_mem *mem, cmark_chunk *c) { ... }
static CMARK_INLINE cmark_chunk cmark_chunk_dup(...) { ... }

Memory Ownership Patterns

Owning vs Non-Owning

The cmark_chunk type makes ownership explicit:

  • alloc > 0 → the chunk owns the memory and must free it
  • alloc == 0 → the chunk borrows memory from elsewhere

Transfer of Ownership

cmark_strbuf_detach() transfers ownership from a strbuf to the caller:

unsigned char *data = cmark_strbuf_detach(&buf);
// Caller now owns 'data' and must free it

Consistent Cleanup

Free functions null out pointers after freeing:

static CMARK_INLINE void cmark_chunk_free(cmark_mem *mem, cmark_chunk *c) {
  if (c->alloc)
    mem->free((void *)c->data);
  c->data = NULL;      // NULL after free
  c->alloc = 0;
  c->len = 0;
}

Iterative vs Recursive Patterns

The codebase avoids recursion for tree operations to prevent stack overflow on deeply nested input:

Iterative Tree Destruction

S_free_nodes() uses sibling-list splicing instead of recursion:

// Splice children into sibling chain
if (e->first_child) {
  cmark_node *last = e->last_child;
  last->next = e->next;
  e->next = e->first_child;
}

Iterator-Based Traversal

All rendering uses cmark_iter instead of recursive render_children():

while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
  cur = cmark_iter_get_node(iter);
  S_render_node(cur, ev_type, &state, options);
}

Type Size Definitions

typedef int32_t bufsize_t;

Buffer sizes use int32_t (not size_t) to:

  1. Allow negative values for error signaling
  2. Keep node structs compact (32-bit vs 64-bit on LP64)
  3. Limit maximum allocation to 2GB (adequate for text processing)

Bitmask Patterns

Option flags use single-bit constants:

#define CMARK_OPT_SOURCEPOS      (1 << 1)
#define CMARK_OPT_HARDBREAKS     (1 << 2)
#define CMARK_OPT_UNSAFE         (1 << 17)
#define CMARK_OPT_NOBREAKS       (1 << 4)
#define CMARK_OPT_VALIDATE_UTF8  (1 << 9)
#define CMARK_OPT_SMART          (1 << 10)

Tested with bitwise AND:

if (options & CMARK_OPT_SOURCEPOS) { ... }

Combined with bitwise OR:

int options = CMARK_OPT_SOURCEPOS | CMARK_OPT_SMART;

Leaf Mask Pattern

S_is_leaf() in iterator.c uses a bitmask for O(1) node-type classification:

static const int S_leaf_mask =
    (1 << CMARK_NODE_HTML_BLOCK) | (1 << CMARK_NODE_THEMATIC_BREAK) |
    (1 << CMARK_NODE_CODE_BLOCK) | (1 << CMARK_NODE_TEXT) | ...;

static bool S_is_leaf(cmark_node *node) {
  return ((1 << node->type) & S_leaf_mask) != 0;
}

This is more efficient than a switch statement for a simple boolean classification.

Cross-References

Was this handbook page helpful?

This page is part of the Project Tick Handbook, which is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. View full license details.
Last updated: April 18, 2026