From 83c9180fe1882e1d2171b79922cc94771f6f0af7 Mon Sep 17 00:00:00 2001 From: Andres Rey Date: Thu, 14 Sep 2017 20:24:50 +0100 Subject: Add summonCthulhu config option + test cases --- README.md | 1 + src/HTMLParser.php | 5 + test/test-pages/webmd-1/config.json | 3 + test/test-pages/webmd-1/expected-metadata.json | 0 test/test-pages/webmd-1/expected.html | 50 + test/test-pages/webmd-1/source.html | 2411 ++++++++++++++++++++++++ 6 files changed, 2470 insertions(+) create mode 100644 test/test-pages/webmd-1/config.json create mode 100644 test/test-pages/webmd-1/expected-metadata.json create mode 100644 test/test-pages/webmd-1/expected.html create mode 100644 test/test-pages/webmd-1/source.html diff --git a/README.md b/README.md index 5c98178..6495171 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,7 @@ If the parsing process was unsuccessful the HTMLParser will return `false` - **substituteEntities**: default value `false`, disables the `substituteEntities` flag of libxml. Will avoid substituting HTML entities. Like `á` to á. - **normalizeEntities**: default value `false`, converts UTF-8 characters to its HTML Entity equivalent. Useful to parse HTML with mixed encoding. - **originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs. +- **summonCthulhu**: default value `false`, remove all + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+ + + + + +
+
+
+
+
+
+ +
+
+
+
+
+
Skip to content + + +
+
+ + + +
+ +
+
+ +
+
+
+
+ +
+
+
+
+
+

Allergies Health Center

+ +
+
+ +
+
+
+
+
+
+
+ + + +
+
+ +
+ +
+
Font Size
+
+
A + +
+
A + +
+
A + +
+
+
+
+ +
+ + + +
+

Babies Who Eat Peanuts Early May Avoid Allergy

+ +
+
By +
WebMD Health News
+
Reviewed by Hansa D. Bhargava, MD +
+
+

+ +

+

Feb. 23, 2015 -- Life-threatening peanut allergies have mysteriously been + on the rise in the past decade, with little hope for a cure.

+

But a groundbreaking new study may offer a way to stem that rise, while + another may offer some hope for those who are already allergic.

+

Parents have been told for years to avoid giving foods containing peanuts + to babies for fear of triggering an allergy. Now research shows the opposite + is true: Feeding babies snacks made with peanuts before their first birthday + appears to prevent that from happening.

+

The study is published in the New England Journal of Medicine, and + it was presented at the annual meeting of the American Academy of Allergy, + Asthma and Immunology in Houston. It found that among children at high + risk for getting peanut allergies, eating peanut snacks by 11 months of + age and continuing to eat them at least three times a week until age 5 + cut their chances of becoming allergic by more than 80% compared to kids + who avoided peanuts. Those at high risk were already allergic to egg, they + had the skin condition eczema, or + both.

+

Overall, about 3% of kids who ate peanut butter or peanut snacks before + their first birthday got an allergy, compared to about 17% of kids who + didn’t eat them.

+

“I think this study is an astounding and groundbreaking study, really,” + says Katie Allen, MD, PhD. She's the director of the Center for Food and + Allergy Research at the Murdoch Children’s Research Institute in Melbourne, + Australia. Allen was not involved in the research.

+

Experts say the research should shift thinking about how kids develop + food allergies, and it should change the guidance doctors give to + parents.

+

Meanwhile, for children and adults who are already allergic to peanuts, + another study presented at the same meeting held out hope of a treatment.

+

A new skin patch called Viaskin allowed people with peanut allergies to + eat tiny amounts of peanuts after they wore it for a year.

+

+ +

A Change in Guidelines?

+ +

Allergies to peanuts and other foods are on the rise. In the U.S., more + than 2% of people react to peanuts, a 400% increase since 1997. And reactions + to peanuts and other tree nuts can be especially severe. Nuts are the main + reason people get a life-threatening problem called anaphylaxis.

+
+
+
+ +
+
+
1 + | + 2 + + | 3 + + | 4 + + | 5 + + +
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+ +
+ + + + + + +
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+ + + + + + +
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+

Today on WebMD

+ +
+
+ man blowing nose + + + +
Make these tweaks to your diet, home, and lifestyle.
+
+
+ Allergy capsule + + + +
Breathe easier with these products.
+
+
 
+
+ cat on couch + + + +
Live in harmony with your cat or dog.
+
+
+ Woman sneezing with tissue in meadow + + + +
Which ones affect you?
+
+
 
+
+
+ + +
+
+
+
+
+ + +
+
+
+
+
+
+
+
+ +
+

+ +
+
+ blowing nose + + +
Article
+ +
+
+ woman with sore throat + + +
Article
+ +
+
 
+
+ lone star tick + + +
Slideshow
+ +
+
+ Woman blowing nose + + +
Slideshow
+ +
+
 
+ + +
+
+
+
+ +
+ +
+
+

Send yourself a link to download the app.

+
+ +
+
+ +
+
+
+
+
Loading ...
+

Please wait...

+
+
+

This feature is temporarily unavailable. Please try again later.

+
+
+

Thanks!

+ +

Now check your email account on your mobile phone to download your new + app.

+
+
+ +
+
+
+

+ +
+
+ cat lying on shelf + + +
Article
+ +
+
+ Allergy prick test + + +
VIDEO
+ +
+
 
+
+ Man sneezing into tissue + + +
Assessment
+ +
+
+ Woman holding feather duster up to face, twitching + + +
Quiz
+ +
+
 
+ + +
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+ +
+
+
+ + + + + -- cgit v1.2.3 From e1f56503f9f26da30f283f0e15405b9fccbc642a Mon Sep 17 00:00:00 2001 From: Andres Rey Date: Thu, 14 Sep 2017 20:27:41 +0100 Subject: Add warning about using the summonCthulhu option --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6495171..6b30349 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ If the parsing process was unsuccessful the HTMLParser will return `false` - **substituteEntities**: default value `false`, disables the `substituteEntities` flag of libxml. Will avoid substituting HTML entities. Like `á` to á. - **normalizeEntities**: default value `false`, converts UTF-8 characters to its HTML Entity equivalent. Useful to parse HTML with mixed encoding. - **originalURL**: default value `http://fakehost`, original URL from the article used to fix relative URLs. -- **summonCthulhu**: default value `false`, remove all