cgit — Caching System
Overview
cgit implements a file-based output cache that stores the fully rendered
HTML/binary response for each unique request. The cache avoids regenerating
pages for repeated identical requests. When caching is disabled
(cache-size=0, the default), all output is written directly to stdout.
Source files: cache.c, cache.h.
Cache Slot Structure
Each cached response is represented by a cache_slot:
struct cache_slot {
const char *key; /* request identifier (URL-based) */
int keylen; /* strlen(key) */
int ttl; /* time-to-live in minutes */
cache_fill_fn fn; /* callback to regenerate content */
int cache_fd; /* fd for the cache file */
int lock_fd; /* fd for the .lock file */
const char *cache_name;/* path: cache_root/hash(key) */
const char *lock_name; /* path: cache_name + ".lock" */
int match; /* 1 if cache file matches key */
struct stat cache_st; /* stat of the cache file */
int bufsize; /* size of the header buffer */
char buf[1024 + 4 * 20]; /* header: key + timestamps */
};
The cache_fill_fn typedef:
typedef void (*cache_fill_fn)(void *cbdata);
This callback is invoked to produce the page content when the cache needs
filling. The callback writes directly to stdout, which is redirected to the
lock file while cache filling is in progress.
Hash Function
Cache file names are derived from the request key using the FNV-1 hash:
unsigned long hash_str(const char *str)
{
unsigned long h = 0x811c9dc5;
unsigned char *s = (unsigned char *)str;
while (*s) {
h *= 0x01000193;
h ^= (unsigned long)*s++;
}
return h;
}
The resulting hash is formatted as %lx and joined with the configured
cache-root directory to produce the cache file path. The lock file is
the same path with .lock appended.
Slot Lifecycle
A cache request goes through these phases, managed by process_slot():
1. Open (open_slot)
Opens the cache file and reads the header. The header contains the original
key followed by creation and expiry timestamps. If the stored key matches the
current request key, slot->match is set to 1.
static int open_slot(struct cache_slot *slot)
{
slot->cache_fd = open(slot->cache_name, O_RDONLY);
if (slot->cache_fd == -1)
return errno;
if (fstat(slot->cache_fd, &slot->cache_st))
return errno;
/* read header into slot->buf */
return 0;
}
2. Check Match
If the file exists and the key matches, the code checks whether the entry has expired based on the TTL:
static int is_expired(struct cache_slot *slot)
{
if (slot->ttl < 0)
return 0; /* negative TTL = never expires */
return slot->cache_st.st_mtime + slot->ttl * 60 < time(NULL);
}
A TTL of -1 means the entry never expires (used for cache-static-ttl).
3. Lock (lock_slot)
Creates the .lock file with O_WRONLY | O_CREAT | O_EXCL and writes the
header containing the key and timestamps. If locking fails (another process
holds the lock), the stale cached content is served instead.
static int lock_slot(struct cache_slot *slot)
{
slot->lock_fd = open(slot->lock_name,
O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | S_IWUSR);
if (slot->lock_fd == -1)
return errno;
/* write header: key + creation timestamp */
return 0;
}
4. Fill (fill_slot)
Redirects stdout to the lock file using dup2(), invokes the
cache_fill_fn callback to generate the page content, then restores stdout:
static int fill_slot(struct cache_slot *slot)
{
/* save original stdout */
/* dup2(slot->lock_fd, STDOUT_FILENO) */
slot->fn(slot->cbdata);
/* restore original stdout */
return 0;
}
5. Close and Rename
After filling, the lock file is atomically renamed to the cache file:
if (rename(slot->lock_name, slot->cache_name))
return errno;
This ensures readers never see a partially-written file.
6. Print (print_slot)
The cache file content (minus the header) is sent to stdout. On Linux,
sendfile() is used for zero-copy output:
static int print_slot(struct cache_slot *slot)
{
#ifdef HAVE_LINUX_SENDFILE
off_t start = slot->keylen + 1; /* skip header */
sendfile(STDOUT_FILENO, slot->cache_fd, &start,
slot->cache_st.st_size - start);
#else
/* fallback: read()/write() loop */
#endif
}
Process Slot State Machine
process_slot() implements a state machine combining all phases:
START → open_slot()
├── success + key match + not expired → print_slot() → DONE
├── success + key match + expired → lock_slot()
│ ├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
│ └── lock failed → print_slot() (serve stale)
├── success + key mismatch → lock_slot()
│ ├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
│ └── lock failed → fill_slot() (direct to stdout)
└── open failed → lock_slot()
├── lock acquired → fill_slot() → close_slot() → open_slot() → print_slot()
└── lock failed → fill_slot() (direct to stdout, no cache)
Public API
/* Process a request through the cache */
extern int cache_process(int size, const char *path, const char *key,
int ttl, cache_fill_fn fn, void *cbdata);
/* List all cache entries (for debugging/administration) */
extern int cache_ls(const char *path);
/* Hash a string using FNV-1 */
extern unsigned long hash_str(const char *str);
cache_process()
Parameters:
size— Maximum number of cache entries (fromcache-size). If0, caching is bypassed andfnis called directly.path— Cache root directory.key— Request identifier (derived from full URL + query string).ttl— Time-to-live in minutes.fn— Callback function that generates the page content.cbdata— Opaque data passed to the callback.
cache_ls()
Scans the cache root directory and prints information about each cache entry
to stdout. Used for administrative inspection.
TTL Configuration Mapping
Different page types have different TTLs:
| Page Type | Config Directive | Default | Applied When |
|---|---|---|---|
| Repository list | cache-root-ttl |
5 min | cmd->want_repo == 0 |
| Repo pages | cache-repo-ttl |
5 min | cmd->want_repo == 1 and dynamic |
| Dynamic pages | cache-dynamic-ttl |
5 min | cmd->want_vpath == 1 |
| Static content | cache-static-ttl |
-1 (never) | SHA-referenced content |
| About pages | cache-about-ttl |
15 min | About/readme view |
| Snapshots | cache-snapshot-ttl |
5 min | Snapshot downloads |
| Scan results | cache-scanrc-ttl |
15 min | scan-path results |
Static content uses a TTL of -1 because SHA-addressed content is
immutable — a given commit/tree/blob hash always refers to the same data.
Cache Key Generation
The cache key is built from the complete query context in cgit.c:
static const char *cache_key(void)
{
return fmt("%s?%s?%s?%s?%s",
ctx.qry.raw, ctx.env.http_host,
ctx.env.https ? "1" : "0",
ctx.env.authenticated ? "1" : "0",
ctx.env.http_cookie ? ctx.env.http_cookie : "");
}
The key captures: raw query string, hostname, HTTPS state, authentication state, and cookies. This ensures that authenticated users get different cache entries than unauthenticated users.
Concurrency
The cache supports concurrent access from multiple CGI processes:
- Atomic writes: Content is written to a
.lockfile first, then atomically renamed to the cache file. Readers never see partial content. - Non-blocking locks: If a lock is already held, the process either serves stale cached content (if available) or generates content directly to stdout without caching.
- No deadlocks: Lock files are
O_EXCL, notflock(). If a process crashes while holding a lock, the stale.lockfile remains. It is typically cleaned up by the next successful writer.
Cache Directory Management
The cache root directory (cache-root, default /var/cache/cgit) must be
writable by the web server user. Cache files are created with mode 0600
(S_IRUSR | S_IWUSR).
There is no built-in cache eviction. Old cache files persist until a new request with the same hash replaces them. Administrators should set up periodic cleanup (e.g., a cron job) to purge expired files:
find /var/cache/cgit -type f -mmin +60 -delete
Disabling the Cache
Set cache-size=0 (the default). When size is 0, cache_process() calls
the fill function directly, writing to stdout with no file I/O overhead:
if (!size) {
fn(cbdata);
return 0;
}