Fixing an XSS vulnerability in marked

A while back I approached Guy from Snyk – a Node.js vulnerability detection and security firm – and asked him to write about an interesting vulnerability his team identified and fixed. This blog post shows an example vulnerability and how easily things can go wrong – even when a package has been tested by several other users beforehand.

Take it away, Guy!

The marked package parses Markdown and converts it into HTML, making it easy to turn rendered user input - user comments, product reviews, support calls - into rich(ish) text, supporting links, bold, italic and more. Since Markdown doesn’t support JavaScript, it’s often considered immune to Cross-Site Scripting, and thus safe to use for rendering user input.

However, in reality Markdown only reduces – but doesn’t completely eliminate – the risk of XSS. The following, easily exploited, XSS vulnerability in marked is a sobering example of the distinction.

Marked’s protection mechanisms

While Markdown doesn’t support scripts, marked (like other Markdown clients) does support inline HTML. Inline HTML can include <script> tags, which can be used by attackers to inject malicious scripts. Since marked is often used to render user input back to the page, its authors added a security option to overcome this case. The package supports a sanitize option, which detects HTML and dangerous input and encodes or removes it.

The sanitize option is (unfortunately) turned off by default, but you can turn it on in your app. The following example shows the sanitize option in action:

var marked = require('marked');
console.log(marked('<script>alert(1)</script>'));
// Outputs: <script>alert(1)</script>

marked.setOptions({ sanitize: true });
console.log(marked('<script>alert(1)</script>'));
// Outputs: <p>&lt;script&gt;alert(1)&lt;/script&gt;</p>

Catching HTML is important, but sanitization doesn’t end there. While Markdown doesn’t support scripts, it does support links, which creates the potential for javascript links (e.g. javascript:alert(1)), which can cause damage when a user clicks them. The sanitize functionality is aware of this, and removes links starting with the javascript: protocol. It even removes links using the HTML entity for colons — e.g. javascript&58;alert(1). Unfortunately, even with this awareness, they missed one vulnerability vector.

The Vulnerability

HTML is a very loose format, and browsers are very tolerant when processing it. An example of this tolerance is that, when processing HTML entities, browsers do not enforce the trailing colon, accepting both &#58 and :. The sanitization in marked, on the other hand, requires the colon, and treats the text as simple text if it doesn’t find it. This means : will be removed, but &58this; will simply be passed along to the output. An attacker can use this technique to evade marked's sanitizer while browsers still execute a script.

Here’s a code illustration of where sanitize does and doesn’t work:

var marked = require('marked');
marked.setOptions({sanitize: true});

// Naive attempt - fails.
console.log(marked('[Gotcha](javascript:alert(1))'));
// Outputs: <p>)</p>

// Evasion attempt using '&#58;' instead of ':' - fails.
console.log(marked('[Gotcha](javascript&#58;alert(1&#41;)'));
// Outputs: <p></p>

// Evasion attempt using '&#58this;' (note the 'this') instead of ':' - SUCCEEDS
console.log(marked('[Gotcha](javascript&#58this;alert(1&#41;)'));
// Outputs: <p><a href="javascript&#58this;alert(1&#41;">Gotcha</a></p>
// Same as: <p><a href="javascript:this;alert(1);">Gotcha</a></p>

The browser will interpret &#58 the same as :, thus invoking the script on click. Of course, the script we included is quite pointless, but an attacker could inject a much more sophisticated payload, breaking the browser’s Same-Origin Policy and triggering the full damage XSS can cause.

Live Exploit on a Sample Node.js application

Nothing helps one appreciate a vulnerability better than exploiting it on real code. Therefore, I added this vulnerability to Snyk’s vulnerable demo application, Goof. You can clone Goof and get it running through the instructions on GitHub.

Goof is a TODO application, and uses marked to support Markdown in its notes. Goof is a best-in-class TODO app, and such an app simply MUST support links, bold and italics!

For instance, entering the TODO items Buy **beer** and [snyk](https://snyk.io/) would result in the expected bold and hyperlink like so:

Next, let’s try to enter a malicious payload. The next screenshot shows the visual and DOM state after entering each of the three attack payloads above. Note that since this is a TODO list, the items are sorted by the date they were added, with the newest on top.

Entering malicious payloads into the TODO app.

As you can see, the first two attempted attacks were mitigated by the sanitizer, reduced to <p>)</p> and <p></p> respectively. The last payload, however, successfully created a hyperlink which will invoke javascript:this;alert(1). Executing this does nothing (simply references an existing variable), while the alert shows a popup.

After making our exploit alert a bit clearer and clicking the link, we get this:

If you’d like to go through this attack flow yourself, install Goof locally and going through the exploit payloads under the exploits directory.

How to Remediate?

For a long while, there was no official version of marked that fixes the issue. The marked repository has been inactive since last summer, and the vulnerability was only disclosed later on. Open source maintenance is a tricky topic, as life circumstances or simply loss of interest can result in very slow updates - or even none at all.

During the period where no fix was available, the only way to fix the issue was by applying a patch using Snyk’s Wizard. This patch was created by our security research team, and is based on Matt Austin’s original pull request to the repository.

Like all other Snyk patches, you can see the detailed patch files in our open source vulnerability database. There are actually 3 different patches for different versions of marked, the simplest of which being no more than this:

  function unescape(html) {
+   // explicitly match decimal, hex, and named HTML entities 
+   return html.replace(/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(?:\w+));?/g, function(_, n) {
-   return html.replace(/&([#\w]+);/g, function(_, n) {

In late July, however, marked’s authors did release a new version, acknowledging the importance of fixing this issue. And so, if you’re using marked today and are able to, you can also fix this issue by upgrading to version 0.3.6.

Summary & Recommendations

This XSS vulnerability on marked is a high severity security issue in a popular package, and so it’s important that you ensure you address it if you’re using the marked package yourself.

In addition, it serves as a good example for several broader issues:

How tricky it is to sanitize complex user input. HTML, SQL and URL encoding are very hard to get completely right, and attackers only need one loophole to get in. If possible, always prefer whitelisting allowed values over blacklisting through pattern matching.
The risks third party dependencies carry for your application. It’s important to stay on top of known vulnerabilities in npm dependencies.
Why open source maintenance is a complicated topic, and how it can have very real and urgent implications.

ponyfoo.com

Fixing an XSS vulnerability in `marked`

Marked’s protection mechanisms

The Vulnerability

Live Exploit on a Sample Node.js application

How to Remediate?

Summary & Recommendations

Comments