almirah-doc/decisions/issues/issue-180-inline-code-spans.md @ 6c48a23e
title: "ISSUE-180: Inline Code Spans Not Recognised"
Status
| Date | Status | |
|---|---|---|
| 21-05-2026 | Proposed | |
| 21-05-2026 | Accepted | |
| 21-05-2026 | In-Progress | |
| * | 22-05-2026 | Implemented |
Context
While reviewing the rendered HTML of [[issue-179-emphasis-flanking-rule]], it was observed that text wrapped in single backticks was not rendered as inline code. In the source, the paragraph contained <i>…</i> wrapped in single backticks — intended to display the literal HTML tags. The rendered page instead showed the body italicised and the <i> tags missing, because the backticks were emitted as literal characters and the browser interpreted the <i>…</i> content between them as actual italic markup.
Two facts about the parser explain this:
- doc_parser.rb:292 handles fenced code blocks (lines starting with three consecutive backticks) at the line level, but there is no corresponding handling for inline (single-backtick) code spans.
-
text_line.rb:57 (
TextLineParser) registers tokens for*,**,***,(,),[,],](— but not for the backtick. Backticks therefore fell into the literal-character branch of the tokenizer and were emitted verbatim into the HTML output, along with anything between them.
This is a real authoring trap: any decision record, requirement, or test step that documents a piece of HTML, an XML tag, or an entity reference would have the inner content silently re-interpreted by the browser.
Decision
Add inline code-span support to the text-line tokenizer.
Tokenisation
Introduce a BacktickToken (whose value is the backtick character itself) registered in TextLineParser#supported_tokens. After the main token-emission loop, run a second pass (fuse_backticks) that walks the token list, pairs consecutive BacktickToken instances, and replaces each opener/content/closer triple with a single InlineCodeToken whose value is the raw content between the backticks. An unmatched BacktickToken (no closer found) is converted to a literal backtick character.
This second pass runs before any emphasis matching in restore, so a code span fully shadows whatever other markers it contains: when the source places *foo* inside a single-backtick code span, the rendered HTML preserves the literal asterisks rather than wrapping foo in <i> tags. The raw content is reconstructed by concatenating the value of each token between the two backticks, which is straightforward because every token in the token stream carries its source-level text in value.
Rendering
Extend TextLineBuilderContext with a default inline_code(str) method that returns the raw string. The active TextLine subclass overrides it to:
- HTML-escape the raw content using
CGI.escapeHTML(covering&,<,>,",'), and - wrap the escaped content in
<code>…</code>tags.
Escaping is performed at the rendering layer, not the parser, to keep concerns separated. The tokenizer stays HTML-agnostic.
Coverage of existing cases
<i>…</i> between backticks renders as <code><i>…</i></code>. A & B renders as <code>A & B</code>. Asterisks, brackets, and other emphasis markers inside backticks remain literal.
Scope
| Item | Status | Start Date | Target Date | Description |
|---|---|---|---|---|
| Code | Done | 21-05-2026 | 22-05-2026 | Add BacktickToken and InlineCodeToken classes in text_line.rb; register the backtick token; add fuse_backticks and next_backtick_index helpers called at the end of tokenize; add InlineCodeToken case in TextLineBuilder#restore; add inline_code to TextLineBuilderContext and TextLine (the latter HTML-escapes via CGI.escapeHTML and wraps in <code> tags); require 'cgi' at the top of the file |
| Tests | Done | 21-05-2026 | 22-05-2026 | Seven new unit tests in text_line_spec.rb covering: a basic foo() code span, HTML escaping of < and > inside backticks, ampersand escaping, emphasis markers inside a code span staying literal, two code spans on the same line, an unmatched single backtick rendered as a literal character, and code-span followed by an italic span |
Out of Scope
-
Fenced code blocks (triple backticks) are unchanged. Those are line-level constructs and already work via
doc_parser.rb. - Multi-backtick code spans (the CommonMark syntax where a pair of double-backtick delimiters surrounds a code-span body that may itself contain a single backtick). Only single-backtick delimiter pairs are supported.
-
Language hints in inline code (e.g., a
js:prefix inside a code span to mark the contained text as JavaScript). Inline code is unstyled beyond the<code>tag. -
CSS styling of the
<code>tag. The HTML now carries<code>elements; whether they receive distinctive monospace/background styling is left tomain.cssand can be addressed independently. -
HTML escaping of plain (non-code) text. Authors who write raw
<or>outside backticks still produce browser-interpreted markup. This ADR scope is limited to closing the inline-code authoring trap. - Backslash-escape of backticks (using a leading backslash to render a literal backtick character). Authors who need a literal backtick outside a code span can write it as-is when it is unmatched; for matched-pair situations escaping is out of scope.
Consequences
Positive
- The HTML rendering of [[issue-179-emphasis-flanking-rule]] now matches the source intent:
<code><i>…</i></code>is displayed verbatim. - Documentation, decision records, and test steps can safely refer to HTML tags, XML elements, and code snippets without manual escaping.
- The change removes a footgun where inline content between backticks was silently re-interpreted by the browser.
Negative
- Pre-existing rendered HTML that unintentionally relied on the loose behaviour (e.g., a document where a stray backtick happened to surround already-correct HTML) would now render as a code span. No such document exists in
Almirah.Doc. - An author who writes a single unpaired backtick in a paragraph (e.g., as part of prose) will see it rendered as a literal backtick character, which is the same as the prior behaviour and is the desired fallback.
Neutral
- The change is local to
text_line.rb. No public API change;format_stringstill takes a raw string and returns formatted HTML. - The default
TextLineBuilderContext#inline_codereturns the raw string, so any non-HTML consumer of the parser (e.g., future plain-text or PDF renderers) sees inline-code content as plain text without modification. - All 26 pre-existing tests in
text_line_spec.rbcontinue to pass without modification.
Alternatives Considered
-
Handle backticks inside
restoreinstead of as a preprocessing pass. Rejected:restoreis a sequential walk; if an italic opener appears before a backtick opener, its closer search would compete with the code-span boundary. Fusing in a preprocessing pass cleanly establishes "code spans win over emphasis" without complicatingrestore. -
Store the rendered
<code>…</code>string directly inInlineCodeToken#value. Rejected: that pushes HTML knowledge into the parser. Keeping escaping inTextLine#inline_codelets the baseTextLineBuilderContextremain HTML-agnostic for future non-HTML consumers. -
Manually substitute
&,<,>instead of usingCGI.escapeHTML. Rejected:CGIis stdlib, the canonical escape covers"and'in addition to the three reserved characters, and using the standard helper avoids accumulating ad-hoc escape logic. -
Require authors to backslash-escape
<and>outside backticks too. Rejected: out of scope for this ADR. The narrower fix closes the documented authoring trap without changing how plain text is treated.
Software Versions
| Software Version Category | Software Version ID |
|---|---|
| Latest Released Version | 0.3.1 |
| Issue Found in Version | 0.4.0 |
| Target Release Version | 0.4.0 |
References
- ISSUE-179 — the document whose rendered HTML surfaced the defect
- ADR-170 — the decision-record convention that this ISSUE follows
Review Evidences
- « Previous
- 1
- 2
- 3
- Next »