Index  Decision Records ADR-188: Escape and Sanitise Generated HTML Output

ADR-188: Escape and Sanitise Generated HTML Output

1 Status

Date Status
05-06-2026 Proposed
Accepted
In-Progress
Implemented

2 Context

Almirah's purpose is to render author-written Markdown (specifications, test protocols, decision records, test data) into a published, interlinked HTML site.

A review of the HTML rendering path in Almirah.Code found that authored Markdown content is interpolated into the generated HTML essentially raw. HTML-escaping is applied in only three narrow places — inline code spans and the visible text of wiki links — confirmed by the fact that the only CGI.escapeHTML calls in the whole library are in lib/almirah/doc_items/text_line.rb (inline code, and the two wiki_link branches). There is no global output-encoding pass and no HTML sanitiser dependency.

As a result, any browser-executable code placed in a source .md file is reproduced verbatim in the generated page and executes when a reader opens it. This is **stored / persistent cross-site scripting (XSS)**: the payload lives in committed Markdown, is baked into the static site at build time (almirah please .), and fires for every visitor.

Confirmed unescaped rendering paths (verified by running the formatter on crafted inputs):

Vector Location Escaped today
Paragraph / blockquote / table-cell text plain-text token emit in TextLine; Paragraph#to_html No
Heading text Heading#to_html (emitted raw; does not even pass through the text formatter) No
Fenced code block content CodeBlock#to_html No
Markdown link URL (href) TextLine#link (no scheme check) No
Markdown link text TextLine#link No
Image src and alt Image#to_html No
Inline code span TextLine#inline_code Yes
Wiki-link display text TextLine#wiki_link Yes

Working proof-of-concept payloads that survive rendering:

One accidental, non-defensive quirk: the inline parser splits on parentheses, so a naive javascript: href is truncated at the first parenthesis — but percent-encoding, attribute breakout, or a data: URI bypasses this entirely. It is not a control.

Threat model: anyone who can land a .md file in any input repository — a contributor, a pull-request author — can plant script that runs in every reader's browser on the docs domain, enabling cookie/session theft, credential phishing on a trusted domain, and potential pivoting against authenticated sessions on the co-hosted Redmine. Severity is High for the published, multi-author deployment Almirah is built for; Low–Medium only where Markdown is authored by a single trusted operator and never published.

3 Decision

Treat all authored Markdown content as untrusted text and encode it for its HTML context at the point of output. Introduce a single, consistently applied escaping mechanism rather than per-item ad-hoc handling, mirroring how inline_code already escapes.

3.1 Text content escaping

Every run of literal text rendered into element content shall be HTML-escaped (the five characters: ampersand, less-than, greater-than, double-quote, single-quote) before interpolation. This covers paragraph text, heading text, blockquote text, Markdown table cells, and fenced code block lines. Escaping is applied to the literal text token only, after Markdown structure (emphasis, code, links, wiki links) has been recognised, so legitimate formatting tags Almirah itself emits are preserved while author-supplied markup is neutralised.

3.2 Attribute value escaping

Values interpolated into HTML attributes — image src and alt, link href and visible text — shall be attribute-escaped so that a quote or angle bracket in the source cannot break out of the attribute and introduce new attributes or elements.

3.3 URL scheme allow-list

For both Markdown links and images, the URL is admitted only if it is a relative path or carries an allowed scheme: http, https, or mailto (and relative/anchor references). Any other scheme — notably javascript:, data:, and vbscript: — is rejected and the link/image is rendered inert (as escaped text or a disabled reference) rather than emitting the dangerous URL.

3.4 Single mechanism

The escaping and scheme-checking helpers shall be defined once and reused by every doc item, so coverage cannot drift item-by-item the way it has. Adding a new renderer must go through the same helpers.

4 Scope

Item Status Start Date Target Date Description
Requirements To Do New SRS items (SRS-096 onward) covering HTML-escaping of all rendered text content; attribute-value escaping for image and link attributes; and a URL scheme allow-list for links and images
Code To Do Add shared escape/attribute-escape and URL-scheme-check helpers in Almirah.Code; apply text escaping to the literal text token in TextLine and to Heading, Blockquote, MarkdownTable, and CodeBlock rendering; attribute-escape Image (src, alt) and TextLine#link (href, text); enforce the scheme allow-list in link and Image; route all of it through the single helper
Tests To Do XSS fixture documents added to Almirah.TDS covering each vector (paragraph, heading, blockquote, table cell, code block, image alt/src breakout, link text, link/image URL scheme); an end-to-end spec in Almirah.Code asserting the generated HTML contains the escaped, inert form for every payload and that legitimate formatting/links still render; a referencing test case/protocol in Almirah.Doc

5 Out of Scope

6 Consequences

6.1 Positive

6.2 Negative

6.3 Neutral

7 Alternatives Considered

8 Software Versions

Software Version Category Software Version ID
Latest Released Version 0.4.0
Issue Found in Version 0.4.0
Target Release Version 0.4.2

9 Affected Documents

# Proposed Text Req-ID
1 The software shall HTML-escape all author-supplied literal text rendered into element content — including paragraph, heading, blockquote, table-cell, and fenced code block text — so that markup present in the source Markdown is rendered as inert text and cannot introduce HTML elements. SRS-096
2 The software shall escape author-supplied values interpolated into HTML attributes — including an image's source and alternate text and a link's address and visible text — so that the value cannot terminate the attribute or introduce additional attributes or elements. SRS-097
3 The software shall admit a link or image URL only when it is a relative reference or uses an allowed scheme ("http", "https", or "mailto"), and shall render any other scheme (such as "javascript", "data", or "vbscript") inert rather than emitting it. SRS-098

10 References

11 Review Evidences