endIndex less than startIndex for implied tags

This issue has been tracked since 2021-08-05.

If I have

const p = new Parser({ onclosetag(name) {
  console.log({
    name,
    startIndex: p.startIndex,
    endIndex: p.endIndex,
  })
}})

and I run

p.write("<p>Foo</p><hr>");

I get

{ name: 'p', startIndex: 6, endIndex: 9 }
{ name: 'hr', startIndex: 10, endIndex: 13 }

But if I don't include the optional end tag for the <p>:

p.write("<p>Foo<hr>");

I get

{ name: 'p', startIndex: 6, endIndex: 9 }
{ name: 'hr', startIndex: 10, endIndex: 9 }

The endIndex for the hr doesn't seem to have been set, so it's the same as for the <p>. This trips up PostHTML's sourceLocations option (posthtml/posthtml-parser#63), which finds an endIndex less than a startIndex. This option is now used by Parcel, leading to parcel-bundler/parcel#6672.

thewilkybarkid wrote this answer on 2021-08-05

Just seen the AST explorer (useful!), so put my case from parcel-bundler/parcel#6672 (comment) at https://astexplorer.net/#/gist/a31bfb2e193e72e403256d885fd4b756/0153ebc4a547b3cbbf3003f033329740369e6aac (focusing on the <hr> shows the problem). (Edit: just spotted that's not the latest version; it does still happen on 6.1.0.)

thewilkybarkid wrote this answer on 2021-08-07

After a bit of look, I think

this.onclosetag(el);
and
this.endIndex = this.tokenizer.getAbsoluteIndex();
are the problem lines: the inferred closing tag (i.e. the missing </p>) is given the indices of the new void element (<hr>), and as the endIndex is taken from the tokenizer directly (which has no knowledge of the inferred </p>) it's wrong.

I'm wondering though, what should the startIndex/endIndex be for the inferred onclosetag (if indeed that callback should be called at all)? Both 6 (so the beginning of the <hr>)? 6 and 9 (same as the <hr>? Maybe even both null?

thewilkybarkid wrote this answer on 2021-08-20

Seems like this happens on invalid HTML too: parcel-bundler/parcel#6672 (comment).

fb55 wrote this answer on 2021-08-20

@thewilkybarkid Thank you so much for the report, and for digging into the details! I have opened #910 with a refactor of start/end indices, which fixes this and other issues with how indices work.

More Details About Repo
Owner Name fb55
Repo Name htmlparser2
Full Name fb55/htmlparser2
Language TypeScript
Created Date 2011-08-27
Updated Date 2023-03-19
Star Count 3793
Watcher Count 50
Fork Count 370
Issue Count 4

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date