From 01fb375746d7ef9178c4bf651774da67632b7454 Mon Sep 17 00:00:00 2001 From: Andres Rey Date: Sun, 26 Mar 2017 11:34:29 +0100 Subject: Added normalizeEntities flag. --- README.md | 1 + 1 file changed, 1 insertion(+) (limited to 'README.md') diff --git a/README.md b/README.md index b9c877d..ee3bfd9 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,7 @@ If the parsing process was unsuccessful the HTMLParser will return `false` - **removeReadabilityTags**: default value `true`, remove the data-readability tags inside the nodes that are added during the rating phase. - **fixRelativeURLs**: default value `false`, convert relative URLs to absolute. Like `/test` to `http://host/test`. - **substituteEntities**: default value `false`, disables the `substituteEntities` flag of libxml. Will avoid substituting HTML entities. Like `á` to รก. +- **normalizeEntities**: default value `false`, converts UTF-8 characters to its HTML Entity equivalent. Useful to parse HTML with mixed encoding. - **originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs. ## Limitations -- cgit v1.2.3