summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndres Rey <[email protected]>2017-09-14 20:27:41 +0100
committerAndres Rey <[email protected]>2017-09-14 20:27:41 +0100
commite1f56503f9f26da30f283f0e15405b9fccbc642a (patch)
tree3edf0fd82ba79efe1b32d6f0706cd2e2a3c937ff
parent83c9180fe1882e1d2171b79922cc94771f6f0af7 (diff)
Add warning about using the summonCthulhu option
-rw-r--r--README.md4
1 files changed, 3 insertions, 1 deletions
diff --git a/README.md b/README.md
index 6495171..6b30349 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ If the parsing process was unsuccessful the HTMLParser will return `false`
- **substituteEntities**: default value `false`, disables the `substituteEntities` flag of libxml. Will avoid substituting HTML entities. Like `&aacute;` to รก.
- **normalizeEntities**: default value `false`, converts UTF-8 characters to its HTML Entity equivalent. Useful to parse HTML with mixed encoding.
- **originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs.
-- **summonCthulhu**: default value `false`, remove all <script> nodes via regex. This is not ideal as it might break things, but might be the only solution to [libxml problems with unescaped javascript](https://github.com/andreskrey/readability.php#known-issues).
+- **summonCthulhu**: default value `false`, remove all `<script>` nodes via regex. This is not ideal as it might break things, but might be the only solution to [libxml problems with unescaped javascript](https://github.com/andreskrey/readability.php#known-issues).
## Limitations
@@ -83,6 +83,8 @@ If you would like to remove the scripts of the HTML (like readability does), you
This is a libxml issue and not a Readability.php bug.
+There's a workaround for this: using the summonCthulhu option. This will remove all script tags via regex, which is not ideal because you may end up summoning [the lord of darkness](https://stackoverflow.com/a/1732454).
+
## Dependencies
Readability uses the Element interface and class from *The PHP League's* **[html-to-markdown](https://github.com/thephpleague/html-to-markdown/)**. The Readability object is an extension of the Element class. It overrides some methods but relies on it for basic DOMElement parsing.