Is it possible to prevent certain symbols from getting encoded?

This issue has been tracked since 2022-07-20.

I am using htmlparser2 with cheerio. Here is sample code to reproduce the issue I'm running into:

const cheerio = require('cheerio');
const htmlparser2 = require('htmlparser2');
const template = `
    {{@button { label:>n) }}}
const dom = htmlparser2.parseDocument(template, {
  xmlMode: false,
  decodeEntities: true

const doc = cheerio.load(dom);


The output of this is:

"\n<html>\n    <body>\n    {{@button { label:;n) }}}\n    </body>\n</html>\n"

The > character gets encoded as &gt;. Is this expected behavior? Is there any way to stop this from happening? I've tried it with decodeEntities set to false and that does not seem to make any difference.

fb55 wrote this answer on 2022-08-03

You'll have a much easier time if you use Cheerio directly:

const doc = cheerio.load(template, { xml: { xmlMode: false, decodeEntities: false });

There should be a better way of telling Cheerio to use htmlparser2, but this works. If Cheerio isn't told to use htmlparser2, it will default to parse5 (for your example: parse5's serializer), which will always follow the HTML spec.

joeosburn wrote this answer on 2022-08-03

Thank you, that solved it.

