summaryrefslogtreecommitdiff
path: root/classes/Debug.php
diff options
context:
space:
mode:
authorChih-Hsuan Yen <[email protected]>2023-11-26 20:53:05 +0800
committerChih-Hsuan Yen <[email protected]>2023-11-26 21:04:56 +0800
commitd4da4dcc321ca65fb2cd19877f395cc5f75933ab (patch)
tree5667b4fb2b4dd42853fb638ef81e6fec10475c52 /classes/Debug.php
parent2c7e000120b23487ed4090241a206f528e6b11f5 (diff)
Fix sanitizer with libxml2 >= 2.12.0
Somehow with newer libxml2, `<?xml encoding="UTF-8">` no longer enforces UTF-8. Instead, non-ASCII contents are treated as ISO-8859-1 and get broken. For example, `<p>中文</p>` becomes `<p>&auml;&cedil;&shy;&aelig;&#150;&#135;</p>` (should be `<p>&#20013;&#25991;</p>`). Switching to another trick mentioned on [1] fixes the issue, and the new trick still works with older libxml2 (tested 2.11.5). As a side note, DOMDocument::loadHTML uses HTMLParser in libxml2 [2][3]. [1] https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly [2] https://github.com/php/php-src/blob/php-8.1.26/ext/dom/document.c#L1855 [3] https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html
Diffstat (limited to 'classes/Debug.php')
0 files changed, 0 insertions, 0 deletions