Is it possible to prevent certain symbols from getting encoded?

This issue has been tracked since 2022-07-20.

I am using htmlparser2 with cheerio. Here is sample code to reproduce the issue I'm running into:

const cheerio = require('cheerio');
const htmlparser2 = require('htmlparser2');
const template = `
    {{@button { label:>n) }}}
const dom = htmlparser2.parseDocument(template, {
  xmlMode: false,
  decodeEntities: true

const doc = cheerio.load(dom);


The output of this is:

"\n<html>\n    <body>\n    {{@button { label:;n) }}}\n    </body>\n</html>\n"

The > character gets encoded as &gt;. Is this expected behavior? Is there any way to stop this from happening? I've tried it with decodeEntities set to false and that does not seem to make any difference.

fb55 wrote this answer on 2022-08-03

You'll have a much easier time if you use Cheerio directly:

const doc = cheerio.load(template, { xml: { xmlMode: false, decodeEntities: false });

There should be a better way of telling Cheerio to use htmlparser2, but this works. If Cheerio isn't told to use htmlparser2, it will default to parse5 (for your example: parse5's serializer), which will always follow the HTML spec.

joeosburn wrote this answer on 2022-08-03

Thank you, that solved it.

More Details About Repo
Owner Name fb55
Repo Name htmlparser2
Full Name fb55/htmlparser2
Language TypeScript
Created Date 2011-08-27
Updated Date 2023-03-19
Star Count 3793
Watcher Count 50
Fork Count 370
Issue Count 4


Issue Title Created Date Updated Date