improve parsing whitespace after end tag names - webdump - HTML to plain-text c… | |
git clone git://git.codemadness.org/webdump | |
Log | |
Files | |
Refs | |
README | |
LICENSE | |
--- | |
commit 72b23084b7c64c298c6b90ae6ad9f53f497cec57 | |
parent a0118e672fd3fa0004ccf2850eaef4ec4bc6fb39 | |
Author: Hiltjo Posthuma <[email protected]> | |
Date: Sat, 29 Jun 2024 18:29:21 +0200 | |
improve parsing whitespace after end tag names | |
Real site example: | |
https://www.gnupg.org/gph/en/manual.html | |
Has HTML such as: | |
<P | |
CLASS="COPYRIGHT" | |
>Copyright © 1999 by <SPAN | |
CLASS="HOLDER" | |
>The Free Software Foundation</SPAN | |
></P | |
> | |
... | |
This incorrectly showed ">" in the end tag as data. | |
Reported by Jason Hood, thanks! | |
Diffstat: | |
M xml.c | 2 ++ | |
1 file changed, 2 insertions(+), 0 deletions(-) | |
--- | |
diff --git a/xml.c b/xml.c | |
@@ -386,6 +386,8 @@ xml_parse(XMLParser *x) | |
else if (c == '>' || ISSPACE(c)) { | |
x->tag[x->taglen] = '\0'; | |
if (isend) { /* end tag, start… | |
+ while (c != '>' && c !… | |
+ c = GETNEXT(); | |
if (x->xmltagend) | |
x->xmltagend(x… | |
x->tag[0] = '\0'; |