HTML spec compliance: `li` and `p` tags are parsed incorrectly

This issue has been tracked since 2022-09-18.

Which @angular/* package(s) are the source of the bug?

compiler

Is this a regression?

No

Description

const { parseTemplate } = require("@angular/compiler");
const ast = parseTemplate("<ul><li>bbb<p>ccc<li>ddd</ul>");
console.log(ast.nodes[0].children.length); // Actual: 1. Expected: 2

The second li tag should close the first li. Instead of that, it's parsed as a child of the p element.

Please provide a link to a minimal reproduction of the bug

https://codesandbox.io/s/flamboyant-lalande-8kpny2?file=/src/index.js

Please provide the exception or error you saw

No response

Please provide the environment you discovered this bug in (run ng version)

Angular 14.2.2

Anything else?

No response

JoostK wrote this answer on 2022-09-19

Almost all HTML parsers on https://astexplorer.net get this incorrect, except for parse5. I wouldn't trust on this working as the HTML spec prescribes if I were you ;)

I have a fix in #47474 (nice issue number) but labeled it as risky, as it's hard to predict what the exact impact of this change is.

thorn0 wrote this answer on 2022-09-19

No need to trust those parsers (overall, IMHO, it's a bad idea to use them as a reference). Check the spec:

An li element's end tag may be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element.

A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element.

Or check how any browser parses this. li can't be a child of p.

JoostK wrote this answer on 2022-09-19

No need to trust those parsers

I wasn't suggesting I did; I mentioned it to indicate that this is widely parsed incorrectly.

thorn0 wrote this answer on 2022-09-19

Yeah. Perhaps, because the spec could be more clear on this.

thorn0 wrote this answer on 2022-09-19

A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element

li isn't explicitly mentioned in this list for the same reason tr and td aren't.

thorn0 wrote this answer on 2022-09-19

Turns out td is parsed incorrectly as well:

const { parseTemplate } = require("@angular/compiler");
const ast = parseTemplate("<table><tr><td>bbb<p>ccc<td>ddd</table>");
console.log(ast.nodes[0].children[0].children.length); // Actual: 1. Expected: 2

https://codesandbox.io/s/great-sammet-hr0k6i?file=/src/index.js

thorn0 wrote this answer on 2022-09-19

Same with <table><tr><td>bbb<li>ccc<td>ddd</table>

More Details About Repo
Owner Name angular
Repo Name angular
Full Name angular/angular
Language TypeScript
Created Date 2014-09-18
Updated Date 2022-09-30
Star Count 84091
Watcher Count 3064
Fork Count 22233
Issue Count 1203

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date