Changed behaviour of parseDocument

This issue has been tracked since 2022-09-16.

Given the following example:

const html = '<html><body><p><div><span>1</span></div></p></body></html>';
parseDOM(html, { xmlMode: false }); // note that parseDOM has changed into parseDocument() by now.

In versions prior to 4.0.0

The example html string would be parsed into a DOM tree that looks like:

|- html
    |- body
        |- p
            |- div
                |-span
                    <textNode>

Which is what I would expect. The body DOM element printed out looks like:

<ref *1> {
  type: 'tag',
  name: 'body',
  children: [
    {
      type: 'tag',
      name: 'p',
      children: [Array],
      parent: [Circular *1]
      ...
    }
  ],
  parent: {
    type: 'tag',
    name: 'html',
    children: [ [Circular *1] ],
    parent: null,
    ...
  },
  ...
}

In versions 4 or later

The example html string would be parsed into a DOM tree that looks like:

|- html
    |- body
        |- p
        |-div
        |-p

The body DOM element looks like:

<ref *1> Element {
  type: 'tag',
  parent: Element {
    type: 'tag',
    parent: null,
    children: [ [Circular *1] ],
    name: 'html',
    ...
  },
  children: [
    Element {
      type: 'tag',
      parent: [Circular *1],
      children: [],
      name: 'p',
      ...
    },
    Element {
      type: 'tag',
      parent: [Circular *1],
      children: [Array],
      name: 'div',
      ...
    },
    Element {
      type: 'tag',
      parent: [Circular *1],
      children: [],
      name: 'p',
      ...
    }
  ],
  name: 'body',
  ...
}

What has changed after version 3.9.2 to get this change in behaviour and is it correct?

fb55 wrote this answer on 2022-12-02

Unless the xmlMode option is enabled, htmlparser2 now has rudimentary support for tags closing other tags, similar to how browsers handle this. In your case, div tags are supposed to close p tags; see here.

More Details About Repo
Owner Name fb55
Repo Name htmlparser2
Full Name fb55/htmlparser2
Language TypeScript
Created Date 2011-08-27
Updated Date 2023-03-19
Star Count 3793
Watcher Count 50
Fork Count 370
Issue Count 4

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date